Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis

Pan, Jianxiong; Ouyang, Qiaolin; Lin, Zhenmin; Hao, Tucheng; Li, Wenyue; Li, Xiangming; Ye, Neng

doi:10.3390/drones10010053

Open AccessArticle

Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis

by

Jianxiong Pan

¹

,

Qiaolin Ouyang

^1,*

,

Zhenmin Lin

¹,

Tucheng Hao

²,

Wenyue Li

²,

Xiangming Li

² and

Neng Ye

²

¹

School of Electronics and Engineering, Beijing Institute of Technology, Beijing 100081, China

²

School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(1), 53; https://doi.org/10.3390/drones10010053

Submission received: 20 December 2025 / Revised: 6 January 2026 / Accepted: 9 January 2026 / Published: 12 January 2026

(This article belongs to the Special Issue Unmanned Aerial Vehicles for Enhanced Emergency Response)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

In a malicious cooperative detection environment, leveraging the evolution patterns of spatio-temporal correlations among ranging sequences across timestamps improves the anomaly detection accuracy to 96.15%.
By eliminating abnormal sequences and correcting them using a K-nearest-neighbor-based imputation method, the localization accuracy increases by 25.9%.

What is the implication of the main finding?

Satellites can effectively assist UAV swarms in detecting emitters over wide areas, even when satellite-to-UAV links experience significant delays.
Although malicious information in multi-UAV collaborative detection can severely degrade system performance, artificial intelligence-based detection and repair mechanisms can effectively mitigate these effects and restore accuracy.

Abstract

Satellite-aided multi-unmanned aerial vehicle (UAV) collaborative localization systems combine the extensive coverage of satellites with the flexibility of UAVs, offering new opportunities for locating highly dynamic emitters across large areas. However, the openness of space-air communication links and the increasing complexity of cybersecurity threats make these systems vulnerable to false data injection attacks. Most existing detection approaches focus only on temporal dependencies in time-frequency features and lack diagnostic mechanisms for identifying malicious UAVs, which limits their ability to effectively detect and mitigate such attacks. To address this issue, this paper proposes an intelligent collaborative localization framework that safeguards localization integrity by identifying and correcting false ranging information from malicious UAVs. The framework captures spatio-temporal correlations in multidimensional ranging sequences through a graph attention network (GAT) coupled with a time-attention-based variational autoencoder (VAE) to detect anomalies through anomalous distribution patterns. Malicious UAVs are further diagnosed through an anomaly scoring mechanism based on statistical analysis and reconstruction errors, while detected anomalies are corrected via a K-nearest neighbor-based (KNN) algorithm to enhance localization performance. Simulation results show that the proposed model improves localization accuracy by 25.9%, demonstrating the effectiveness of spatial–temporal feature extraction in securing collaborative localization.

Keywords:

multi-UAV collaboration; collaborative detection; secure detection; data injection attack; graph-attention network

1. Introduction

Emitter localization is a key enabler for a wide range of critical applications, including border security monitoring, electromagnetic countermeasures, and emergency rescue. The localization performance directly determines the accuracy of security situation monitoring and early warning, as well as the efficiency of disaster response, all of which are crucial for national security and the safety of human life and property [1,2,3]. However, achieving high-precision emitter localization in real-world environments remains challenging. First, wide-area monitoring demands number of facilities for signal reception and processing to infer target locations, making it difficult to eliminate coverage gaps and avoid blind zones. Second, target mobility leads to time-varying propagation conditions, and complex terrain can introduce occlusion, reflection, and scattering, which can severely degrade localization accuracy. Therefore, accurately locating highly dynamic emitters over wide areas is challenging [4].

The development of the satellite Internet of things, together with the development of unmanned aerial vehicle (UAV) swarm, is expected to address the aforementioned challenges and foster an innovative evolution of collaborative localization technologies [5]. Leveraging wide-area coverage and high reliability, satellites can act as crucial relays among sensing UAVs, thus enabling space-air collaborative localization [6,7]. As illustrated in Figure 1, sensing UAVs provides high-altitude line-of-sight transmission and flexible mobility, which can mitigate environmental interference to localization accuracy while reducing the deployment cost of ground-based localization infrastructures [8]. When inter-UAV communication is constrained or even denied, satellites can further serve as data aggregation hubs by collecting sensing data from multiple UAVs and achieving precise localization through data fusion and onboard processing.

However, in addition to the challenges caused by the long-distance and the limited transmission resource, deploying such systems still faces additional threats caused by the increasingly complex landscape of space-air network security countermeasures [9,10]. Unlike terrestrial communication systems, the highly open and dynamic nature of the space-air environment hinders the effective deployment of traditional encryption and security policies. Moreover, these approaches themselves are vulnerable to cryptanalysis and side-channel attacks [11,12]. This vulnerability enables malicious UAVs to eavesdrop on sensitive localization information without physical proximity, or even to launch data injection attacks through identity forgery. Such malicious activities can disrupt localization processes, severely threatening the reliability of the localization systems.

Given the strong concealment of potential adversarial UAVs [13], implementing active defense strategies is often impractical. Therefore, passive defense mechanisms that detect anomalies in sensing data are essential in identifying and mitigatingmalicious information. Existing studies have explored intrusion detection systems that leverage anomaly detection to identify abnormal data, based on the fact that a target moving along a trajectory induces geometrically constrained and spatially correlated ranging variations among UAVs. Recent advances in multivariate time-series anomaly detection further enhance detection accuracy and efficiency through enhanced feature extraction and better model design, thereby reducing the risk of system compromise.

However, most existing methods still focus primarily on temporal dependencies, which limits the ability to fully exploit feature information in other dimensions. Also, only a few studies support anomaly diagnosis using anomaly scores or error rankings, and the corresponding scoring functions are often not carefully optimized [14,15]. Moreover, the underlying causes of detected anomalies are rarely analyzed or interpreted, which restricts their applicability to multi-UAV ranging sequence detection in complex, wide-area space-air integrated environments [16]. These limitations motivate the development of anomaly detection model capable of efficiently extracting multi-dimensional features from ranging sequences and optimizing the anomaly scoring mechanism, so as to enable accurate detection and diagnosis in strongly adversarial environments.

2. Related Works

Figure 2 provides a taxonomy of representative methods that are closely related to multi-UAV ranging-sequence anomaly detection. Overall, existing studies can be grouped into two research lines. The first line focuses on generic time-series anomaly detection, including traditional model-based techniques and recent deep spatiotemporal learning methods for multivariate sequences. The second line investigates robust cooperative localization under malicious nodes, where defenses are typically developed from the perspective of measurement consistency screening or reputation management. Although these advances have demonstrated effectiveness in their respective settings, several gaps remain when considering satellite-aided multi-UAV cooperative localization, such as the difficulty of leveraging spatiotemporal dependencies under dynamic collaboration, and the practical challenges caused by communication delays and sequence misalignment.

2.1. Traditional Time-Series Anomaly Detection

Traditional anomaly detection techniques can be broadly categorized into decomposition-, statistical-, clustering-, probability-, and classification-based methods. Decomposition-based approaches identify anomalies by analyzing trends and residuals after sequence decomposition to capture non-stationary features [17]. Statistical methods characterize normal behavior and detect deviations, such as principal component analysis (PCA)-based monitoring for satellite telemetry [18]. Clustering- and probability-based approaches infer anomalies by cluster membership tests or low-probability events under learned distributions [19,20,21,22]. Classification-based methods cast anomaly detection as a decision problem using one-class support vector machine (SVM), random forest, or K-nearest neighbor (KNN) [23,24]. Despite their simplicity and interpretability, these methods often rely on handcrafted features and have limited capability in capturing complex cross-variable dependencies, especially in high-dimensional and strongly coupled multi-agent sensing scenarios.

2.2. Deep Learning for Temporal Anomaly Modeling

In recent years, deep learning (DL) has shown strong capability in learning complex patterns and has been widely adopted for time-series anomaly detection. Many studies focus on the temporal modeling to capture long-term dependencies and anomalies across multiple time scales [25,26]. Convolutional neural networks and temporal convolutional networks have also been employed to model temporal correlations and improve robustness to local fluctuations [27,28]. However, purely temporal modeling may be insufficient for multi-agent sensing problems where anomalies are coupled through spatial relationships among agents. In addition, many temporal models implicitly assume well-aligned sequences, which may not hold in practice when sensing and communication delays introduce time offsets across agents.

2.3. Graph-Based Spatial Learning and Spatio-Temporal Anomaly Detection

To capture correlations among variables in multivariate sequences, graph neural networks (GCNs) have been introduced to model spatial dependencies. GCNs enable representation learning over graph structures and encode relationships among neighboring nodes [29]. Graph convolutional autoencoders further exploit graph topology and node attributes for unsupervised detection [30].

To more effectively model time series anomalies from multiple perspectives, recent studies have extended these ideas to spatiotemporal graph learning [14,31,32]. Some studies incorporate frequency-domain cues to enhance feature representations and reconstruction quality [33,34,35]. These methods have shown promising results in cloud systems [36], multivariate sensing [31], and industrial IoT scenarios with multi-periodic patterns [37]. Nevertheless, applying spatiotemporal anomaly detection to cooperative localization faces additional challenges, including the need to handle time offsets across agents, dynamic network topology, and the interaction between motion-induced correlations and measurement anomalies.

2.4. Robust Cooperative Localization Under Malicious Nodes

Robust cooperative localization has been studied under adversarial settings where some nodes may provide malicious measurements. A common strategy is snapshot-level screening based on measurement consistency, where nodes with inconsistent measurements are removed according to predefined rules [38]. Another approach is error-based screening that discards nodes associated with large localization errors under certain aggregation criteria [39]. Such defenses can be effective when malicious nodes are few, but they may become less reliable as the attacker population grows, since instantaneous inconsistency can also cause benign nodes to be incorrectly excluded. To incorporate temporal information, trust or reputation-based mechanisms have been proposed to update node credibility over time [40]. Although these approaches introduce temporal smoothing, credibility updates are often heuristic and may not fully exploit spatiotemporal dependencies embedded in multi-agent ranging sequences.

2.5. Open Issues and Remaining Gaps

In summary, existing time-series anomaly detection methods have progressed from traditional model-based techniques to deep spatiotemporal learning, while robust cooperative localization defenses have evolved from snapshot-level screening to temporal trust management. However, for satellite-aided multi-UAV cooperative localization, several open issues remain, including how to effectively exploit spatiotemporal correlations across multiple UAVs under dynamic collaboration, how to handle sequence misalignment caused by sensing and communication delays, and how to maintain reliable malicious-data screening when adversarial behaviors become frequent or structured over time. These limitations motivate further investigation into system-aware anomaly screening designs for satellite-aided multi-UAV cooperative localization.

3. System Model

In this section, we first describe the setup of the satellite-aided multi-UAV collaborative secure localization system in a strongly adversarial environment with malicious UAVs, and then formulate the problem of multidimensional ranging data anomaly detection and diagnosis.

3.1. Multi-UAV Collaborative Secure Localization System

As shown in Figure 3, in a satellite-aided multi-UAV collaborative localization system, the ground station first assigns a search and monitoring mission to the UAV fleet. The UAVs then collaboratively locate the dynamic emitter within the designated area and transmit the collected ranging data back to the ground station via the satellite. We consider a satellite-aided multi-UAV collaborative localization system designed to locate a moving target within a region, where an unknown number of malicious UAVs are in existence. The system consists of one satellite and n sensing UAVs to cooperatively perform localization through ranging measurements. Specifically, the sensing UAVs transmit their ranging data to the satellite via data frames over non-secure communication channels. For legitimate sensing UAVs, data transmission to the satellite is often selective to conserve energy and reduce link utilization. In contrast, malicious UAVs can exploit sniffing tools to eavesdrop on these transmissions, learn the data frame structure, and compromise the encryption mechanism. Here we consider the stealthy and energy-efficient jamming model [41,42,43,44] where malicious UAVs perform attacks during inactivity periods of legitimate UAVs rather than attack continuously to remain stealthiness and thus interfere long-horizon collaborative localization. Specifically, the malicious UAV injects falsified data frames with deceptive ranging information to the satellite, thereby disrupting normal localization operations and degrading system accuracy.

It is assumed that the positions of malicious UAVs are unknown, and each malicious UAV only impersonates the sensing UAVs located nearby. Taking the spoofing attack launched by a malicious UAVs named Eve, which is adjacent to sensing UAVs

S_{i}

as an example, the detailed process is as follows. Firstly, Eve uses sniffing tools to eavesdrop on transmissions, so as to obtain the data frame format sent by

S_{i}

, denoted as

F_{S_{i}} = (T_{S_{i}}, d_{i}, I D_{i})

, where

T_{S_{i}}

represents the timestamp,

d_{i}

represents the actual ranging data collected by sensing UAV

S_{i}

at that moment, and

I D_{i}

uniquely identifies the transmitting UAV’s identity. When

S_{i}

is inactive and does not send a data frame, Eve sends a falsified frame

F_{E v e} = (T_{E v e}, {\hat{d}}_{i}, I D_{i})

, where

T_{E v e}

and

{\hat{d}}_{i}

denote the forged timestamp and ranging data, respectively, and

T_{E v e} \neq T_{S 1}

. Assuming within L timestamps, the ranging sequence from

S_{1}

received by the satellite at the T-th time step is

{d_{i}^{(T - τ_{i} - L + 1)}, d_{i}^{(T - τ_{i} - L + 2)}, {\hat{d}}_{i}^{(T - τ_{i} - L + 3)}, \dots, d_{i}^{(T - τ_{i})}}

, where

τ_{i}

is the delay between

S_{i}

and the satellite, and we assume the number of normal ranging values is much greater than the number of anomalous ones.

3.2. Detection and Diagnosis Process

Considering the differences in delays between different sensing UAVs and the satellite, to avoid asynchronous processing of multidimensional ranging data, a unified timestamp baseline needs to be established. For this purpose, we take the maximum delay among all UAVs,

τ_{*}

, as the baseline, expressed as

τ_{*} = max {τ_{1}, τ_{2}, \dots, τ_{n}} .

(1)

Taking the most recently received data as the final processing moment, the n-dimensional ranging value processed by the satellite at the T-th timestamp is

d_{T} = {d_{1}^{(T - τ_{*})}, d_{2}^{(T - τ_{*})}, \dots, {\hat{d}}_{n}^{(T - τ_{*})}}

, and the overall n-dimensional ranging data to be processed by the satellite over the latest L timestamps is given by

\begin{matrix} D & = {d_{1}, d_{2}, \dots, d_{L}} \\ = (\begin{matrix} d_{1}^{(T - τ_{*} - L + 1)} & {\hat{d}}_{1}^{(T - τ_{*} - L)} & \dots & d_{1}^{(T - τ_{*})} \\ d_{2}^{(T - τ_{*} - L + 1)} & d_{2}^{(T - τ_{*} - L)} & \dots & d_{2}^{(T - τ_{*})} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ {\hat{d}}_{n}^{(T - τ_{*} - L + 1)} & d_{n}^{(T - τ_{*} - L)} & \dots & d_{n}^{(T - τ_{*})} \end{matrix}), \end{matrix}

(2)

where

D \in R^{n \times L}

,

i \in {1, 2, \dots, n}

,

j \in {(T - τ_{*} - L + 1), \dots, (T - τ_{*})}

represents the actual ranging value of the i-th sensing UAV at the j-th timestamp, and

{\hat{d}}_{i}^{j}

represents the forged anomalous ranging value by the malicious UAV Eve at the j-th timestamp.

When the satellite receives the ranging data, it performs anomaly detection and diagnosis. Anomaly detection aims to determine whether the ranging data at each timestamp is anomalous. Specifically, the algorithm computes an anomaly score (AS) for the ranging data at every timestamp and compares it with a pre-defined or dynamically adjusted threshold. The relationship between the AS and the threshold serves as the criterion for anomaly evaluation. If the AS exceeds the threshold, it indicates the presence of at least one anomalous value within the n-dimensional ranging data at that timestamp. Otherwise, all n-dimensional ranging values are considered normal.

Based on the detection results, anomaly diagnosis further analyzes and attempts to identify the underlying causes. During the diagnosis process, the anomaly scores within each time window are accumulated along the time dimension and sorted in descending order along the UAV dimension. From the sorted list, one or more anomalous UAVs can be found.

In the subsequent anomaly value processing stage, the anomalous values corresponding to the diagnosed anomalous timestamps are first deleted based on the diagnosis list and filled using an appropriate method. Then, the filled sequence

D^{'} \in R^{n \times L}

is split into n ranging sequences along the UAV dimension and input into the satellite-aided multi-UAV intelligent collaborative localization framework to estimate the position information of the dynamic emitter target. The notations for key variables used throughout the process are summarized in Table 1.

3.3. Problem Formulation for Malicious UAVs Identification

During the detection and diagnosis process, the design of the AS construction and diagnosis modules plays a crucial role to overall performance. In this paper, we employ an encoder–decoder structure to reconstruct the ranging sequences, and we use the resulting reconstruction error to identify abnormal UAVs.

Specifically, we formulate an optimization problem that minimizes the reconstruction error by tuning the parameters of the encoder and the decoder, thereby capturing the underlying features of the ranging sequences. The resulting problem is given by

\begin{matrix} P 1 : min_{θ_{1}, θ_{2}} \sum_{t \in [1, L]} {∥d_{t} - g_{θ_{1}} (f_{θ_{2}} (d_{t}))∥}^{2}, \\ s . t . g_{θ_{1}} (f_{θ_{2}} (d_{t})) \geq 0, \forall t \in [1, L], \end{matrix}

(3)

where

f_{θ_{2}}

denotes the encoder parameterized by

θ_{2}

, which maps the input into a low-dimensional space, and

g_{θ_{1}}

denotes the decoder parameterized by

θ_{1}

, which reconstructs the original input from the low-dimensional representation. We define the reconstructed ranging sequence as

d_{t}^{(re)} = g_{θ_{1}} (f_{θ_{2}} (d_{t})) \in R^{n \times 1}

. Since ranging values are inherently nonnegative, we impose the constraint

d_{t}^{(re)} \geq 0

to reflect the physical feasibility and to facilitate stable training convergence.

To further identify the malicious UAV and remove their corresponding measurements for improved localization, we define the reconstruction error of the i-th UAV at time t, denoted by

E_{i}^{(t)}

, as the difference between the i-th element of

d_{t}^{(re)}

and

d_{t}

, as

E_{i}^{(t)} = | d_{t}^{(re)} [i] - d_{t} [i] | .

(4)

We then apply a scoring function

f_{trans} (\cdot)

that sums the anomaly scores

E_{i}^{(t)}

to aggregate the per-UAV errors into a timestamp-level anomaly score by summing

{E_{i}^{(t)}}

across all UAVs. If this aggregated score exceeds a threshold

ξ

, the data at timestamp t is considered abnormal.

Then, for each of the anomalous timestamps, anomaly cause diagnosis is processed based on the descending order of the anomaly scores of each UAV. For example, if the anomaly scores of each UAV at timestamp t are ranked as

\{a_{1}^{(t)}, a_{2}^{(t)}, \dots, a_{m}^{(t)}, \dots, a_{n}^{(t)}\}

and there are m actual anomaly causes, then the anomaly cause diagnosis list is

\{S_{1}, S_{2}, \dots, S_{m}\}

, where the formulation of the anomaly score will be discussed in Section 4 in detail.

4. Spatio-Temporal Feature Analysis for Anomaly Detection and Diagnosis

This section presents the overall architecture of the space–air integrated collaborative secure localization system, which primarily includes an anomaly detection model based on a GAT and a time-attention–enhanced VAE, as well as the corresponding anomaly detection and diagnosis methods.

4.1. Collaborative Secure Localization Model Architecture

The overall architecture of the proposed collaborative secure localization framework is illustrated in Figure 4. It consists of five main components, including ranging sequence preprocessing, spatial feature extraction, temporal feature extraction, anomaly detection and diagnosis, and subsequent anomaly value processing. These components jointly provide security assurance for the localization process.

Specifically, we perform the diagnoses based on reconstruction error. Firstly, the preprocessing module standardizes the raw multidimensional sequence data and applies a sliding window to generate fixed-length inputs for subsequent processing, with its output fed into the spatial feature extraction module. This module constructs an initial graph using UAV similarity, learns spatial feature correlations via graph learning, and sends the resulting spatial feature to the temporal feature extraction module. There, an attention-based LSTM layer captures temporal dependencies, generating a spatiotemporal feature matrix. This matrix is then further compressed into a low-dimensional representation and reconstructed, where the resulting reconstruction error forms the basis for anomaly detection and diagnosis.

Then, the anomaly detection and diagnosis module converts reconstruction error to an anomaly score via a scoring function and compares it with a dynamic threshold for detection. Once an anomaly is identified, the module performs diagnosis by ranking UAVs according to their likelihood of being anomalous. The impact of these UAVs is mitigated by removing and appropriately filling the anomalous values at the corresponding timestamps.

Finally, the processed multidimensional sequence of each sensing UAV is fed into a collaborative localization module, which aggregates the distance information from multiple sensing UAVs based on least squares to accurately estimate the position of the radiation source. The following parts will describe the functional implementation of each module in detail.

4.2. Multidimensional Ranging Sequence Preprocessing

To improve the convergence speed and training stability, the ranging data are preprocessed with standardization and min-max normalization based on a sliding window mechanism. Since the range of fluctuations depends on target dynamics, directly applying min-max normalization may lead to the loss of subtle variations when target dynamics are less pronounced, thereby introducing bias into subsequent model training.

Therefore, the original multidimensional sequence

D \in R^{n \times L}

is divided into multiple windowed segments using a sliding window mechanism with a step size of 1, and each segment is standardized independently. Notably, the normalization range is changed to [−4, 5) based on experience, allowing the model to better adapt to potentially large anomalous values in the sequences and improve detection accuracy. The standardization process for each segment is defined as

W_{n o r m}^{'} = 9 \times \frac{W^{'} - min (W^{'})}{max (W^{'}) - min (W^{'})} - 4,

(5)

where

max (W^{'})

and

min (W^{'})

are the maximum and minimum values of the segment

W^{'}

, respectively.

In addition, when detecting data at the first timestamp of the test set, the preceding

L - 1

values from the training set are padded to preserve temporal correlations.

4.3. Spatial-Temporal Anomaly Detection Based on GAT and VAE

The core of the collaborative secure localization lies in DL-based anomaly detection that extracts spatio-temporal features. The detection part consists of two components, including a spatial feature extraction module based on the GAT and a temporal feature extraction module based on the VAE. Learned representations are used to build the reconstruction difference, which reflects the difference between the expected and the collected ranging data, serves as an indirect indicator of anomalies.

4.3.1. Spatial Feature Extraction Module

To efficiently capture the feature correlations of the multidimensional ranging sequence in the spatial dimension, we propose to use a graph, G=

(V, E)

, to represent and learn spatial feature correlations, where

V = {S_{1}, S_{2}, \dots, S_{n}}

is the set of nodes that each corresponding to a specific sensing UAV, and

E

is the set of edges that represents their spatial feature relationships between UAVs identified by the satellite. The feature vector of each sensing UAV

S_{i}

is set as the delayed ranging sequence

D_{i} = {d_{i}^{(T - τ_{*} - L + 1)}, d_{i}^{(T - τ_{*} - L)}, \dots, d_{i}^{(T - τ_{*})}}

, and the basis of the feature extraction lies in the similarity between the ranging sequences, calculated as

sim (D_{i}, D_{j}) = \frac{D_{i} \cdot D_{j}}{∥D_{i}∥ ∥D_{j}∥} = \frac{\sum_{k = 1}^{L} d_{i, k} d_{j, k}}{\sqrt{\sum_{k = 1}^{L} d_{i, k}^{2}} \sqrt{\sum_{k = 1}^{L} d_{j, k}^{2}}},

(6)

where

d_{i, k}

and

d_{j, k}

represent the k-th element of the vectors

D_{i}

and

D_{j}

, respectively.

Specifically, we define an adjacency matrix

A_{m} \in R^{n \times n}

to represent the potential correlation between the ranging values of collaborative sensing UAVs associated with each edge. All elements of

A_{m}

are initialized to 1 by default and changed through a GAT model that explores the correlation factors between nodes in the graph data and learn to aggregate the potential spatial relationships between UAVs. An example of the network layer is shown in Figure 5. To obtain a low-dimensional representation of the network, instead of directly using the similarity given in Formula (6), all UAVs share a learnable weight matrix

W_{g}

, which is used for linear transformation of the original features. For UAV

S_{i}

, the shared weight matrix

W_{g}

is first used to linearly transform its own and its adjacent UAVs’ original features

D_{i}

,

D_{j}

to obtain

W_{g} D_{i}

and

W_{g} D_{j}

. Then, self-attention is applied to itself and its adjacent UAVs, based on the following function:

Attention (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(7)

where Q is the query vector, i.e.,

W_{g} D_{i}

, representing

S_{i}

’s attention to other adjacent UAVs, K is the key vector, which is the information carrier matching Q. The similarity between Q and K is calculated to measure the importance of different adjacent UAVs to

S_{i}

; V is the value vector, storing and updating the feature information of

S_{i}

. In the proposed setting, both K and V equals to

W_{g} D_{j}

.

\sqrt{d_{k}}

is the dimension of K. Here, the Softmax function is used to convert the similarity scores between

W_{g} D_{i}

and

W_{g} D_{j}

into a probability distribution, representing the relative weights of different adjacent UAVs.

Subsequently, a shared attention mechanism is adopted in each GAT layer to learning the correlation weights between adjacent UAVs. The attention weight between

S_{i}

and

S_{j}

at the l-th layer represents the value of attention that UAV

S_{i}

pays to

S_{j}

, is calculated as:

α_{i j}^{(l)} = LeakyReLU ([a^{T} (W_{g} h_{i}^{(l - 1)} ∥ W_{g} h_{j}^{(l - 1)})]),

(8)

where

h_{i}^{(l - 1)}

and

h_{j}^{(l - 1)}

represent the feature embedding vectors at the

l - 1

-th layer, with

h_{i}^{(0)} = W_{g} D_{i}

,

a \in R^{2 L^{'}}

represents the weight of the single-layer feedforward network adjacent to the attention mechanism,

L^{'}

is the feature dimension after linear transformation using

W_{g}

, ∥ represents the concatenation operation, and LeakyReLU is an activation function. To facilitate comparison of the attention degrees to different adjacent UAVs, the Softmax function is used to normalize the attention coefficients of all adjacent UAVs of

S_{i}

, including itself, i.e.,

N (i) \cup i

, to obtain the attention weight:

a_{i j} = \frac{exp (LeakyReLU ([a^{T} (W_{g} h_{i}^{(l - 1)} ∥ W_{g} h_{j}^{(l - 1)})]))}{\sum_{k \in N (i) \cup {i}} exp (LeakyReLU ([a^{T} (W_{g} h_{i}^{(l - 1)} ∥ W_{g} h_{k}^{(l - 1)})]))},

(9)

where

a_{i j}

represents the normalized attention coefficient between UAVs

S_{i}

and

S_{j}

. Based on

a_{i j}

, the features of adjacent UAVs, including the UAV’s own features, are aggregated, and a weighted sum is performed to obtain the updated feature embedding vector

h_{i}

of UAV

S_{i}

at the l-th layer:

h_{i}^{(l)} = δ (\sum_{j \in N (i) \cup {i}} a_{i j} W_{g} h_{j}^{(l - 1)}),

(10)

where

δ

represents the ReLU activation function.

To enhance the generalization ability of the GAT model, this paper adopts the multi-head attention mechanism shown in Figure 6 to more comprehensively capture graph structure information. First, the query weight matrix

W_{Q}

, key weight matrix

W_{K}

, and value weight matrix

W_{V}

are extracted separately from the embedding vector

h_{i}^{(l - 1)}

. Then, after scaled dot-product calculation and matrix concatenation,

h_{i}^{(l)}

is obtained. Under the multi-head attention mechanism, the update method for UAV

S_{i}

’s embedding is as follows:

h_{i}^{(l)} = \frac{1}{r} \sum_{r^{'} = 1}^{r} [δ (\sum_{j \in N (i) \cup {i}} a_{i j}^{(r^{'})} W_{g} h_{j}^{(l - 1)})],

(11)

where

a_{i j}^{(r^{'})}

represents the normalized attention coefficient calculated by the

r^{'}

-th attention head. Average multi-head attention is used instead of concatenation to maintain the consistency of the dimensions of

h_{i}^{(l)}

and the original input

D_{i}

. After multiple GAT layer operations, this module outputs the spatial feature embedding vectors of all UAVs

h_{g} = {h_{1}, h_{2}, \dots, h_{n}}^{T}

.

This paper optimizes the GAT model of the spatial feature extraction module through negative sampling, specifically using the binary cross-entropy loss function:

L_{GAT} = - \sum_{(i, j) \in Ω} log δ (〈{(h_{i})}^{T}, h_{j}〉) - \sum_{(i^{'}, j^{'}) \in Ω^{-}} log δ (- 〈{(h_{i^{'}})}^{T}, h_{j}^{'}〉),

(12)

where

〈 \cdot, \cdot 〉

represents the cosine similarity metric function,

Ω

and

Ω^{-}

are the sets of positive node pairs and negative node pairs in graph G, respectively.

(i, j) \in Ω

indicates that node

S_{i}

and

S_{j}

have an adjacency relationship, forming a positive node pair with strong association or high feature similarity, while negative node pairs indicate two UAVs that are randomly sampled non-adjacent UAV pairs, forming negative node pairs with weak or no association. The goal of training the GAT model is to maximize the similarity of feature embeddings of positive node pairs while minimizing the similarity of feature embeddings of negative node pairs.

4.3.2. Temporal Feature Extraction Module

After obtaining the spatial feature embedding vector

h_{g}

for each node, a VAE model based on an attention mechanism is further used to extract the temporal dependencies of the multidimensional ranging sequence. Figure 7 details the basic structure of the temporal feature extraction module and the training method of the proposed anomaly detection model. First, an LSTM-based attention mechanism is used to capture the importance of different time steps within the time window. Compared to ordinary LSTM networks, the attention-based LSTM learns non-fixed weight parameters

w_{i}

during training, i.e., assigning dynamic weights to the inputs, and then emphasizes the information of important time steps based on the degree of correlation between time steps.

The hidden state

h_{t - 1}^{'}

and cell state

c_{t - 1}^{'}

from the previous time step can be calculated using Equation (13).

\begin{matrix} f_{t} & = δ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}), \\ i_{t} & = δ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}), \\ o_{t} & = δ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}), \\ {\tilde{c}}_{t} & = tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}), \\ c_{t} & = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}, \\ h_{t} & = o_{t} ⊙ tanh (c_{t}) . \end{matrix}

(13)

Combined with the input

h_{i}

at the current time step, they are fed into a linear layer to obtain

{\tilde{w}}_{i}

:

{\tilde{w}}_{i} = tanh (W_{c} \cdot [h_{t - 1}^{'}, c_{t - 1}^{'}, h_{i}] + b_{c}),

(14)

where

W_{c}

and

b_{c}

represent the weight and bias term of the linear layer, respectively. After Softmax normalization, the weight

w_{i}

is obtained:

w_{i} = \frac{exp ({\tilde{w}}_{i})}{\sum_{k = 1}^{n} exp ({\tilde{w}}_{k})} .

(15)

For UAV

S_{i}

, its feature embedding vector

h_{i}

is processed through the LSTM layer and attention layer, and the output

x_{i}

is given by

x_{i} = h_{i} \cdot w_{i} .

(16)

Aggregating

x_{t} = {x_{1}, x_{2}, . . ., x_{n}}^{T}

yields the entire output of the spatial feature embedding vector

h_{g}

after passing through the attention mechanism-based LSTM, i.e., the weighted embedding of spatio-temporal features.

Subsequently, the resulting

x_{t}

, which has sequentially extracted spatial feature embeddings and temporal dependency features, is input into an unsupervised learning VAE model. By training, the feature patterns of normal sequence data are learned, enabling more effective discrimination of significant differences between normal samples and potential anomalous samples, providing a basis for anomaly detection.

The VAE compresses the high-dimensional features

x_{t}

into a low-dimensional latent representation

z_{t}

through dimensionality reduction, and then reconstructs

x_{t}

based on

z_{t}

to obtain the reconstructed output

{\hat{x}}_{t}

. Specifically, the VAE maps the original simple probability distribution to the true probability distribution of the training set, where

z_{t}

is generated based on the sampling and parameters of the input high-dimensional features

x_{t}

.

z_{t}

contains both the key information of

x_{t}

and satisfies a normal distribution. The probability of

x_{t}

can be calculated for

z_{t}

using the total probability formula:

p (x_{t}) = \int_{z_{t}} p (x_{t} ∣ z_{t}) \cdot p (z_{t}) d z_{t},

(17)

where

p (z_{t})

is the probability of the latent representation

z_{t}

, and

p (x_{t} ∣ z_{t})

is the probability of

x_{t}

given

z_{t}

. However,

z_{t}

resides in a high-dimensional space and cannot be enumerated, making it difficult to compute

p (x_{t})

. Moreover, the posterior distribution is also difficult to compute:

p (z_{t} ∣ x_{t}) = p (z_{t}) \cdot p (x_{t} ∣ z_{t}) / p (x_{t}) .

(18)

Therefore, this paper introduces the encoder of the VAE as an inference model

q_{θ_{2}} (z_{t} ∣ {\hat{x}}_{t})

to approximate the posterior distribution

p (z_{t} ∣ x_{t})

, and the decoder

p_{θ_{1}} (z_{t} ∣ {\hat{x}}_{t})

as a generative model to address the above problems, where

θ_{1}

and

θ_{2}

are the learnable parameters of the generative model and inference model, respectively. During VAE model training, the mean

μ^{'}

and variance

σ^{'} 2

parameters of the latent space representation

z_{t}

are trained using samples. To enable backpropagation, the reparameterization trick is used, i.e., sampling noise

ϵ_{s}

from a standard normal distribution and computing

z_{t}

based on this:

z_{t} = μ^{'} + \frac{1}{2} (exp log σ^{'} 2) ⊙ ϵ_{s},

(19)

where ⊙ represents element-wise multiplication, and

ϵ_{s}

does not participate in the gradient calculation process.

The temporal feature extraction module uses the attention mechanism-based VAE to capture the importance of UAVs in different time windows, learn the potential distribution patterns of normal sequences, and establish a corresponding probability distribution model. The model is trained by maximizing the likelihood of the input. Considering that the input likelihood is difficult to compute directly, the problem is transformed into maximizing the evidence lower bound of the log-likelihood. Thus, the loss function for training the attention mechanism-based VAE is expressed as

\begin{matrix} L_{A - VAE} & = {∥{\hat{x}}_{t} - h_{g}∥}_{2} + D_{K L} [q_{θ_{2}} (z_{t} ∣ {\hat{x}}_{t}) ∥ p_{θ_{1}} (z_{t})] \\ = \sum_{i = 1}^{n} {∥{\hat{x}}_{i} - h_{i}∥}_{2} + \frac{1}{2} (- log σ^{'} 2 + μ^{'} 2 + σ^{'} 2 - 1) . \end{matrix},

(20)

L_{A - VAE}

consists of two parts: reconstruction loss and KL divergence regularization term. To train the VAE reconstruction part while simultaneously learning the variable weights of the attention mechanism, the mean squared error (MSE) between the spatial feature embedding vector

h_{g}

and the VAE output

{\hat{x}}_{t}

is computed in the reconstruction loss part to reflect the difference between them. The optimization goal is to make them as similar as possible, which helps in identifying anomalous samples that cause significant reconstruction errors during testing. The KL divergence regularization term aims to minimize the KL divergence between the approximate posterior and the prior of the latent representation

z_{t}

, ensuring that the

z_{t}

generated by the inference model

q_{θ_{2}} (z_{t} ∣ {\hat{x}}_{t})

conforms as much as possible to a standard normal distribution.

Finally, we use a joint loss function for end-to-end training of the temporal and spatial feature processing modules, simultaneously optimizing the spatial correlation modeling capability and the normal pattern reconstruction capability based on temporal dependencies of the proposed anomaly detection model, and use the hyperparameter

β

to balance the relative importance of the two parts:

L_{\sum^{'}} = L_{GAT} + β L_{A - VAE} .

(21)

Algorithm 1 describes the training process of the proposed anomaly detection model. Additionally, during model training and testing, unlike the common practice of calculating reconstruction errors per timestamp in general anomaly detection models, this paper calculates the reconstruction errors of ranging data at different timestamps along the UAV dimension, providing a basis for the subsequent UAV-dimensional scoring mechanism. The value of the UAV-dimensional reconstruction error

E_{i}^{(t)}

is taken as the absolute error between the original input

x_{t}

and the reconstructed output

{\hat{x}}_{t}

of the VAE reconstruction part at the corresponding position.

4.4. UAV-Wise Anomaly Scoring Mechanism

The anomaly scoring mechanism is the prerequisite for anomaly detection and diagnosis. A common practice is to directly use the reconstruction error as the anomaly score. However, direct use of the reconstruction error makes it difficult to interpret and compare errors at the per-UAV level, which in turn affects the anomaly cause diagnosis procedure. To obtain an interpretable and robust per-UAV score, the proposed design maps the reconstruction error to a tail-probability-based score by fitting error statistics to Gaussian models and using the corresponding cumulative distribution function (CDF). The proposed mapping yields an anomaly score that increases monotonically with the reconstruction error while reducing sensitivity to scale differences across UAVs.

Algorithm 1 Training Process of the Anomaly Detection Model Based on GAT and Time-Attention Mechanism VAE

Input: Multidimensional delayed ranging sequence $D = {D_{1}, D_{2}, \dots, D_{n}}^{T} \in R^{n \times L}$ , Number of training epochs $E_{2}$ , Joint optimization loss hyperparameter $β$ , Learning rate $l r^{'}$
Initialize GAT, attention mechanism-based LSTM, VAE network structures and parameter set $θ^{'}$
while $Epoch count \leq E_{2}$ do
Compute cosine similarity according to Equation (6), initialize structure graph G and adjacency matrix A
Compute and normalize graph attention weights according to Equations (8) and (9)
Update graph node embeddings $h_{i}$ according to Equation (11)
for i = 1 to n do
Aggregate to obtain spatial feature embedding vector $h_{g}$
end for
Obtain attention variable weights according to Equations (14) and (15)
Compute weighted embedding $x_{i}$ according to Equation (16)
for i = 1 to n do
Aggregate to obtain spatio-temporal feature weighted embedding $x_{t}$
end for
Input $x_{t}$ into VAE encoder, compute sample mean $μ^{'}$ and variance ${σ^{'}}^{2}$
Use reparameterization trick to sample noise $ϵ_{s}$ from $N (0, 1)$
Compute latent space representation $z_{t}$ according to Equation (19)
Obtain reconstructed output ${\hat{x}}_{t}$ via VAE decoder
Compute $L_{GAT}$ and $L_{A - VAE}$ according to Equations (12) and (20) respectively, compute joint optimization loss $L_{\sum^{'}}$ of the model according to Equation (21)
Update parameters using backpropagation: $θ^{'} \leftarrow θ^{'} - l r^{'} \nabla_{θ}^{'} L_{\sum^{'}}$
end while
Output: Reconstruction result ${\hat{x}}_{t}$

Specifically, the proposed scoring mechanism converts the reconstruction error

E_{i}^{(t)}

of UAV i at timestamp t into an anomaly score

a_{i}^{(t)}

by jointly considering static reference distribution learned from training data and dynamic reference distribution estimated from recent samples. The scoring function is defined as follows:

a_{i}^{(t)} = - γ log [1 - Φ (\frac{E_{i}^{(t)} - {\hat{μ}}_{s}^{i}}{{\hat{σ}}_{s}^{i}})] - (1 - γ) log [1 - Φ (\frac{E_{i}^{(t)} - {\hat{μ}}_{d}^{i (t)}}{{\hat{σ}}_{d}^{i (t)}})],

(22)

where the coordination factor

γ \in [0, 1]

,

Φ

represents the cumulative distribution function (CDF) of the standard normal distribution

N (0, 1)

,

{\hat{μ}}_{s}^{i}

and

{\hat{σ}}_{s}^{i}

represents the mean and standard deviation of the training-set error for UAV i, computed as

{\hat{μ}}_{s}^{i} = \frac{1}{W_{s}} \sum_{k = 1}^{W_{s}} E_{i}^{k},

(23)

{\hat{σ}}_{s}^{i} = \sqrt{\frac{1}{W_{s} - 1} \sum_{k = 1}^{W_{s}} {(E_{i}^{k} - {\hat{μ}}_{s}^{i})}^{2}},

(24)

where

W_{s}

represents the static window covering the entire training set sample data or a subsampled training subset, used to obtain long-term stable statistical features of the data. In Equation (22),

{\hat{μ}}_{d}^{i (t)}

and

{\hat{σ}}_{d}^{i (t)}

represent the mean and standard deviation of the error samples for the i-th UAV at timestamp t, computed based on the dynamic sliding window, calculated as follows:

{\hat{μ}}_{d}^{i (t)} = \frac{1}{W_{d}} \sum_{j = 1}^{W_{d}} E_{i}^{(t - j + 1)},

(25)

{\hat{σ}}_{d}^{i (t)} = \sqrt{\frac{1}{W_{d} - 1} \sum_{j = 1}^{W_{d}} {(E_{i}^{(t - j + 1)} - {\hat{μ}}_{d}^{i (t)})}^{2}},

(26)

where

W_{d}

represents a sliding window on the test stream that contains the most recent

W_{d}

timestamps. In general, the window length

W_{d}

is much smaller than

W_{s}

, so that the dynamic statistics can capture short-term fluctuations and local disrtibution shift.

Assuming independence among per-UAV anomaly scores, the proposed method aggregates the per-UAV scores by summation to obtain the timestamp-level anomaly score

{AS}_{t}

for detection:

{AS}_{t} = \sum_{i = 1}^{n} a_{i}^{(t)} .

(27)

In the unsupervised training paradigm, the training set are assumed to contain no anomalous samples, and anomalies in the test set are typically sparse. Therefore, the training-based statistics (

{\hat{μ}}_{s}^{i}

,

{\hat{σ}}_{s}^{i}

) provide a reliable long-term reference for normal behavior. During deployment, real-time detection data may exhibit distribution drift and short-term variations, and anomalous events may appear as abrupt deviations. The dynamic-window statistics

({\hat{μ}}_{d}^{i (t)}, {\hat{σ}}_{d}^{i (t)})

enable adaptive scoring related to the recent baseline and improve sensitivity to short-lived abnormal changes.

4.5. Anomaly Detection, Diagnosis, and Elimination

Since the scoring function is a negative log probability, correspondingly, this paper uses the negative logarithm of the Gaussian tail probability

ξ_{i}

as the basis for the anomaly detection threshold [45].

ξ_{i}

is calculated as follows:

ξ_{i} = 1 - Φ (\frac{{\bar{μ}}_{d}^{i (t)} - {\hat{μ}}_{d}^{i (t)}}{{\hat{σ}}_{d}^{i (t)}}),

(28)

where

{\bar{μ}}_{d}^{i (t)}

represents the mean of the error samples in the short-term moving average line window

W_{d}^{'}

within the dynamic window

W_{d}

, calculated similarly to Equation (25). Then, the negative logarithm of

ξ_{i}

is summed along the UAV dimension and set as the threshold

ξ

:

ξ = - \sum_{i = 1}^{n} log (ξ_{i}) .

(29)

Whether an anomaly exists is judged by the relationship between the timestamp anomaly score

{AS}_{t}

and the threshold

ξ

. The anomaly label

l_{t}

for the corresponding timestamp is

l_{t} = \{\begin{matrix} 1, & if {AS}_{t} > ξ \\ 0, & otherwise \end{matrix} .

(30)

After detecting anomalies for all timestamps within the time window, the set of anomalous timestamps with label 1 is statistically derived:

A = \{t ∣ \sum_{i = 1}^{n} a_{i}^{(t)} > ξ\},

(31)

Based on the anomaly scores at each anomalous timestamp in the anomaly set

A

, the anomaly scores of each UAV are sorted in descending order as the basis for diagnosis:

\{S_{1}, S_{2}, \dots, S_{n}\}, a_{1}^{(t)} \geq a_{2}^{(t)} \geq \dots a_{n}^{(t)} .

(32)

If there are o actual anomaly causes, then the anomaly cause diagnosis list is

\{S_{1}, S_{2}, \dots, S_{o}\}

.

Algorithm 2 details the anomaly detection and diagnosis process. In summary, the method proposed in this section leverages the anomaly detection model, combining GAT with the time-attention mechanism-based VAE alongside the corresponding detection and diagnosis process to solve the ranging sequence anomaly detection and diagnosis problem.

Algorithm 2 Anomaly Detection and Diagnosis Process

Input: Test sequence $D_{test} \in R^{n \times L}$ , Coordination factor $γ$ , Dynamic window $W_{d}$ and Static window $W_{s}$ , Number of anomaly causes o.
Input $D_{test}$ into the trained anomaly detection model, compute UAV-dimensional reconstruction errors $E_{i}^{(t)}$
Compute the mean ${\hat{μ}}_{s}^{i}$ and standard deviation ${\hat{σ}}_{s}^{i}$ of the training errors within the static window $W_{s}$ according to Equations (23) and (24) respectively
Compute the mean ${\hat{μ}}_{d}^{i (t)}$ and standard deviation ${\hat{σ}}_{d}^{i (t)}$ of the reconstruction errors at specific timestamps within the dynamic window $W_{d}$ according to Equations (25) and (26) respectively
Convert $E_{i}^{(t)}$ into UAV anomaly scores $a_{i}^{(t)}$ according to Equation (22)
for i = 1 to n do
Sum $a_{i}^{(t)}$ of all UAVs at the same timestamp to get the total anomaly score ${AS}_{t}$
end for
Compute the anomaly threshold $ξ$ according to Equations (28) and (29)
while $Timestamp count \leq L$ do
if ${AS}_{t}$ > $ξ$ then
Anomaly label $l_{t}$ = 1
Record ${AS}_{t}$ as an anomalous timestamp and include it in the anomaly set $A$
else
Anomaly label $l_{t}$ = 0
end if
For each timestamp in $A$ , rank the anomaly scores and take the top o as the anomaly causes
end while
Output: Anomaly labels $l_{t}$ and corresponding anomaly cause lists $\{S_{1}, S_{2}, \dots, S_{o}\}$

In the subsequent anomaly value processing stage, for the delayed ranging sequence

D_{i} = {d_{i}^{(T - τ_{*} - L + 1)}, {\hat{d}}_{i}^{(T - τ_{*} - L)}, \dots, d_{i}^{(T - τ_{*})}}

, the anomalous value

{\hat{d}}_{i}^{(T - τ_{*} - L)}

at the anomalous timestamp j is deleted based on the diagnosis list and marked as a missing value. Considering that the number of missing values is small and they are temporally correlated with neighboring values, this paper uses the KNN algorithm to fill in the missing parts.

First, the distance between the missing value and all known values within the window is calculated. Then, the set of timestamps of the k values closest to the missing value is selected, denoted as

N_{k} = \{s_{j, 1}, s_{j, 2}, \dots, s_{j, k}\}

. The distance between the value at the k-th timestamp is denoted as

s_{j, k}

and the missing value is denoted as

d_{j, k}

. After determining the nearest neighbors, the missing value is estimated using a weighted average method, which can be expressed as

\begin{matrix} {\tilde{d}}_{i}^{(T - τ_{*} - L)} = \frac{\sum_{s_{j, k} \in N_{k}} w_{j, k} x_{j, k}}{\sum_{s_{j, k} \in N_{k}} w_{j, k}}, \\ w_{j, k} = \frac{1}{d_{j, k}}, \end{matrix}

(33)

where

x_{j, k}

represents the delayed ranging value corresponding to timestamp

s_{j, k}

, and the weight

w_{j, k}

is inversely related to the distance, meaning the farther the distance, the smaller the weight. The above operation is performed for all anomalous values. Then, the filled sequence

D^{'} \in R^{n \times L}

is split into n ranging sequences along the UAV dimension and finally input into the satellite-aided multi-UAV intelligent collaborative localization framework to estimate the position information of the dynamic emitter target.

5. Simulation Results and Analysis

This section first introduces the generated anomaly detection dataset, the baseline methods for comparison, and the parameter settings of the proposed method. Then, the anomaly detection and diagnosis performance of the proposed model is evaluated through simulations. The results verify that the collaborative framework effectively maintains localization accuracy under strongly adversarial conditions.

5.1. Anomaly Detection Dataset Generation

Since obtaining real-world satellite-aided multi-UAV collaborative localization data with precise anomaly labels is currently impractical due to the absence of relevant public datasets, restricted access to operational satellite interfaces, and the prohibitive cost of physical validation, we generate the required dataset via high-fidelity simulation. This simulation is based on realistic satellite settings and measured UAV ranging characteristics to ensure it closely reflects actual deployment conditions with validated and reproducible parameters.

In this simulated scenario, a satellite at an altitude of 300 km aids a group of sensing UAVs. The sensing UAVs are distributed within a region measuring 500 m in length, 500 m in width, and 100 m in height. Their collaborative task is to locate a dynamic emitter target, which moves within a 60 m × 60 m × 60 m cubic area at a speed randomly set between 10 m/s and 20 m/s. It is assumed that there are 3 malicious UAVs in the environment that can either choose to disguise themselves as 3 normal UAVs to launch spoofing attacks, or remain dormant. Specifically, the ranging values sent by these malicious UAVs are tampered with by adding or subtracting random values drawn from a normal distribution to the original data. The detailed simulation parameters are provided in Table 2.

Considering that the model is trained in an unsupervised scenario, the test set data is built by randomly altering data points based on the training set. Referring to the anomaly ratio in open-source datasets like Mars Science Laboratory and secure water treatment, the number of randomly generated anomalous timestamps is 268, with an anomaly ratio of approximately 11.2%. Some anomaly cases are shown in Figure 8a, where dashed lines and red dots represent anomalous timestamps and anomalous values, respectively. Also, Figure 8b presents the normalized ranging sequence, providing a general understanding of the tampering variations. Note that the two subfigures are plotted from different segments of the dataset, and they are provided as complementary visualizations instead of an exact point-to-point correspondence.

5.2. Baseline Models and Model Parameter Settings

To verify the effectiveness of the proposed anomaly detection model in both detection and cause diagnosis, this paper considers comparing it with the following anomaly detection baseline models:

(1): Autoencoder (AE) [46]: An unsupervised neural network model that compresses input data into low-dimensional features through an encoder, and reconstructs the original input through a decoder.Anomalies can be detected because anomalous data, having a different distribution from the training data, result in higher reconstruction errors.
(2): Autoencoder (AE) [46]: An unsupervised neural network model that compresses input data into low-dimensional features through an encoder, and reconstructs the original input through a decoder. Anomalies can be detected because anomalous data, having a different distribution from the training data, result in higher reconstruction errors.
(3): Long short-term memory-based variational autoencoder (VAE-LSTM) [47]: A time series anomaly detection model combining VAE and LSTM. It captures the temporal dependencies and latent probability distribution of sequence data to model the patterns of the sequence data, detecting anomalies based on reconstruction errors or deviations in the latent distribution probability.
(4): Multivariate time-series anomaly detection via graph attention network (MTAD-GAT) [48]: An anomaly detection model combining graph attention networks and time series modeling techniques. It uses two parallel graph attention networks to model the temporal dependencies and spatial correlations of multivariate time series, respectively, then aggregates spatio-temporal features through a lightweight gated recurrent unit, and constructs the model loss function based on prediction error and reconstruction error to comprehensively evaluate anomaly scores.
(5): Multi-scale convolutional recurrent encoder-decoder (MSCRED) [14]: An anomaly detection model that processes multivariate time series using image processing methods. It converts sequences into multi-scale signature matrices, captures spatio-temporal dependencies between variables through a convolutional residual network and an encoder-decoder architecture, and combines multi-scale feature analysis to enhance the detection capability for sophisticated anomalies. Anomalies are determined comprehensively based on reconstruction error and anomaly persistence.

Considering that the anomaly score calculation methods of the baseline models are not uniform, this paper uses the scoring function integrating static and dynamic reconstruction error statistical features to uniformly convert the prediction errors or reconstruction errors of the models.

In specific simulation settings, the number of GAT layers in the proposed anomaly detection model is set to 1, the number of LSTM layers based on the attention mechanism is set to 1, and both the encoder and decoder of the VAE in the reconstruction part consist of 2 fully connected layers with hidden layer dimensions of 128 and 64, respectively. In the step of calculating the anomaly score based on the anomaly scoring function, the sizes of the static window

W_{s}

and dynamic window

W_{d}

are set to 1000 and 12, respectively, and

ξ_{i}

in the anomaly threshold

ξ

is set to 0.01. Additionally, to prevent model overfitting, early stopping is used during training, with the maximum number of consecutive epochs without performance improvement on the validation set fixed at 10. The other main parameters of the proposed anomaly detection model are shown in Table 3.

5.3. Anomaly Detection Model Complexity Analysis

To analyze model complexity, Table 4 compares the training and detection complexity and parameter count of our model, traditional PCA, and the baseline MTAD-GAT. In our model, the GAT reduces time complexity to

O (k_{a} n L)

via adjacency matrix sparsification, with

k_{a} = 2

and 10 nearest neighbors, achieving ∼18% of full-graph complexity. Temporal modeling via time-attention LSTM costs

O (L H^{2})

, while the VAE’s latent variable processing is a negligible

O (H)

. For detection, only the forward pass is needed, yielding

O (k_{a} n L)

complexity. In contrast, MTAD-GAT employs two GAT layers, leading to higher complexity. PCA’s training involves covariance matrix computation and decomposition, with complexity

O (n L^{2} + L^{3})

, which grows exponentially for large L and is substantially higher than our approach.

Regarding scalability, the proposed model exhibits linear scaling with both the number of UAVs (n) and the sequence length (L), as indicated by the

O (k_{a} n L)

term in both training and detection complexities. This linear scalability is a significant advantage for large-scale deployments, where both n and L may increase. In contrast, the complexity of PCA grows quadratically with L (

O (L^{2})

) and cubically (

O (L^{3})

) in training, making it less scalable for long sequences. Meanwhile, MTAD-GAT scales quadratically with n (

O (n^{2} L)

), becoming a bottleneck when the number of UAVs grows. Therefore, our model offers superior scalability in both dimensions, making it more suitable for real-time anomaly detection in large-scale satellite-aided multi-UAV systems.

It’s worth to note that the proposed model significantly reduces the number of parameters used in the graph attention network by setting the number of attention heads and the number of nearest neighbor connected nodes. Meanwhile, only 2 fully connected layers are used in the encoder and decoder of the VAE model, and encoder-decoder shared latent variable mapping is adopted. The total number of parameters is lower than the two baseline models, reflecting low complexity in terms of both parameter scale and computational overhead.

Considering that the anomaly detection model is trained in an unsupervised manner, the training process can be completed offline. Subsequently, the trained model is deployed on the localization satellite for the subsequent anomaly detection and diagnosis process. Since the proposed model has lower actual detection time complexity compared to the traditional PCA method and the two baseline models that consider spatio-temporal feature correlations of multivariate time series, and a lower number of parameters than the compared baseline models, it is suitable for resource-constrained satellite-borne scenarios.

5.4. Anomaly Detection Performance Simulation

To evaluate the detection performance of the proposed anomaly detection model, it is compared with several common and efficient anomaly detection baseline models. The main evaluation metrics are precision, recall, and F1-score. The calculation methods of the three metrics are as follows:

\begin{matrix} Precision = \frac{TP}{TP + FP}, \\ Recall = \frac{TP}{TP + FN}, \\ F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}, \end{matrix}

(34)

where true positives (TP) refer to the number of time steps that are truly anomalous and are correctly predicted as anomalous. False positives (FP) denote the number of time steps that are actually normal but are incorrectly predicted as anomalous. False negatives (FN) represent the number of time steps that are actually anomalous but are incorrectly predicted as normal. The F1-score jointly considers precision and recall to assess the overall detection performance, and it is used as the primary metric for evaluating anomaly detection performance across models.

Considering that the coordination factor

γ

directly affects the anomaly score at UAV dimension score

a_{i}^{(t)}

, anomaly detection, and diagnosis results, the performance metrics of each model were tested for

γ

values of

[0, 0.25, 0.5, 0.75, 1]

. The simulation results are shown in Table 5. The results show that the proposed model achieves partial improvement in F1-score under all conditions compared to other baseline models, indicating that its comprehensive anomaly detection performance is better.

Specifically, the overall performance of the AE model is the lowest, with recall only slightly better than LSTM-AD when

γ

is 0.75. This is because AE is typically used to process independent and identically distributed data and may ignore the temporal context dependencies inherent in localization ranging sequences, leading to insufficient detection capability for dynamic anomalies. The LSTM-AD model primarily models temporal dependencies through the LSTM network structure but does not consider the correlations between multidimensional sequences. The VAE-LSTM model, compared to the LSTM-AD, improves by considering temporal dependencies while learning the joint probability distribution of normal data, making it sensitive to latent anomalies in the spatial distribution. However, its shortcoming is that it fails to fully consider the spatial relationships of multidimensional sequences.

Both the MTAD-GAT and MSCRED models consider the spatio-temporal feature correlations of multidimensional sequences, and their overall performance is better than the first three baseline models, but they lack the ability to model the evolution of spatial correlations across time steps and the probability of the latent distribution of sequence data.

The proposed model in this paper learns spatial correlations between UAVs through the graph attention network, captures the temporal dependencies of the ranging sequence using the LSTM layer based on an attention mechanism, and learns the latent representation of normal data based on the VAE reconstruction network. Finally, an end-to-end joint optimization method is adopted to couple the modeling processes of spatial features, temporal features, and data distribution. The model achieves the highest precision and F1-score of 0.9615 and 0.9434, respectively, demonstrating optimal detection performance.

The simulation results in Table 5 also indicate that the proposed scoring function based on static and dynamic reconstruction error statistical features improves the detection performance of each model to some extent. Here,

γ

values of 1 and 0 correspond to using scoring functions based on a static Gaussian fitted distribution and a dynamic Gaussian fitted distribution, respectively. The proposed model and other baseline models achieve their optimal F1-score and precision when

γ

is 0.25 or 0.5, indicating that a higher weight for local statistical features within the dynamic sliding window is beneficial for enhancing the model’s sensitivity to sudden anomalies, while the long-term statistical characteristics of static features also provide a relatively stable global judgment benchmark to a certain extent. Most models exhibit the lowest comprehensive detection performance when using the static scoring function, possibly because the scoring function based solely on global statistical features of historical data struggles to adapt to the changing trends of non-stationary data and lacks sensitivity to sudden anomaly patterns, making it difficult for the model to accurately detect anomalies.

Furthermore, the proposed model performs better than traditional models with single-dimensional feature extraction because the proposed model’s spatio-temporal dimensional feature extraction capability better captures the anomalous coupling patterns in multidimensional ranging sequences. All models achieve performance optimization by adjusting

γ

, indicating that in anomaly detection tasks for multidimensional ranging sequences in highly dynamic environments, comprehensively considering long-term stability and local dynamics can further enhance the model’s generalization ability in anomaly detection.

To verify the feature extraction and anomaly detection capabilities of the proposed anomaly detection model, a visual analysis comparison was conducted between the distribution of part of the original data on the test set and the data distribution generated by the model-processed latent space representation. First, PCA preprocessing was used to map the high-dimensional data to a low-dimensional space, and then the t-distributed stochastic neighbor embedding (t-SNE) algorithm was used for visualization processing [49].

The results are shown in Figure 9. It can be seen that in the left figure, there is no clear distinction between normal and anomalous values, indicating that the feature distributions of normal data and a small amount of anomalous data are relatively close. The right figure shows the distribution of the data latent space representation generated by the VAE reconstruction network of the model. It can be observed that the normal data clusters together, while the anomalous data, represented by red circles, form an independent cluster in the lower right corner, indicating that the proposed model can effectively distinguish the representations of the two types of data in the latent space. The distribution differences in the latent space representations generated based on spatio-temporal features further demonstrate that the proposed model can accurately detect anomalies, providing a foundation for subsequent anomaly cause diagnosis and anomaly value processing.

5.5. Anomaly Diagnosis Performance Simulation

Considering that anomaly diagnosis is a further explanation of the detection results, this section compares the anomaly cause diagnosis capability of the proposed model and other baseline models at their optimal detection performance from the previous subsection. The main evaluation metrics are

HitRate @ 100 %

and RC−top−3, which are adopted to measure the model’s anomaly diagnosis performance [15,50]. The goal of anomaly diagnosis is to identify the root causes of anomalies. In this paper, it specifically refers to screening out which one or several “ranging values” at anomalous timestamps are sent by malicious UAVs.

HitRate @ P %

is calculated as

HitRate @ P % = \frac{Hit @ ⌊ P % \times | GT | ⌋}{| GT |},

(35)

where

⌊ \cdot ⌋

denotes the floor operation, and

| GT |

is the length of the ground truth array (i.e., the number of real anomaly causes). This metric represents the proportion of overlap between the top

⌊ P % \times | GT | ⌋

causes in the diagnostic ranking list and the real causes. Here,

HitRate @ 100 %

corresponds to

P = 100

, focusing on evaluating the model’s ability to diagnose all anomaly causes.

The other metric,

RC - top - k

, refers to the probability that at least one of the top k causes in the diagnostic ranking list is a real cause. RC−top−3 in this section uses

k = 3

, which measures the model’s ability to screen and rank high-probability anomaly causes, focusing on accurately identifying anomalous values that deviate most from normal data patterns. Both metrics intuitively reflect the matching degree between diagnostic causes and real causes. The value of P and k should be determined according to the number of real anomaly causes. These metrics are related to whether as many anomalies in the ranging sequence can be discovered and processed as possible, ensuring that subsequent collaborative localization accuracy does not deviate significantly.

The anomaly diagnosis capabilities of each model are shown in Table 6. It can be seen that, compared to the baseline models, the proposed model shows some improvement in both diagnostic performance metrics. Additionally, it can be found that models with good anomaly detection performance can diagnose the true anomaly causes more accurately. The reason is that these models can effectively rank the anomaly scores of each UAV at anomalous timestamps before aggregation. The more the true anomalous UAVs or anomalous values rank at the top, the better the model’s anomaly diagnosis performance.

To further illustrate the importance of the temporal feature extraction module and the spatial feature extraction module of the proposed model, we have added ablation experiments. The evaluation metrics were detection accuracy, F1-score, and RC−top−3. Three variant models were obtained by ablating the two modules, respectively:

w/o GAT: Remove the spatial feature extraction module containing the GAT network, retaining only the temporal feature extraction module. Specifically, the multidimensional ranging sequence is directly input into the attention mechanism-based LSTM layer. The loss function for training this variant model needs to remove the $L_{GAT}$ part from the joint loss function.
w/o Attention: Remove the attention mechanism-based LSTM layer, retaining the spatial feature extraction module and the reconstruction part of the temporal feature extraction module. Specifically, the graph node spatial feature embedding is directly input into the VAE. The loss function for training this variant model needs to modify the reconstruction loss part of $L_{A - VAE}$ in the joint loss function.
w/o Scoring: Remove the UAV-dimensional anomaly scoring module, retaining the spatial feature extraction module, the reconstruction part of the temporal feature extraction module, and the LSTM-Attention module. Specifically, the framework directly uses the reconstruction error as the anomaly score and ignores the differences among UAV nodes.

The results of the ablation experiments are shown in Figure 10. As observed, the performance of the three variant models, with the spatial feature extraction module, the attention mechanism, the LSTM layer of the temporal feature extraction module, and the scoring module removed, respectively, is affected in all three metrics. Removing the attention mechanism LSTM layer from the temporal feature extraction module causes the F1-score to decrease by 6.96%. This is because the ranging sequence has strong temporal dependencies, and the attention mechanism-based LSTM layer can effectively capture the temporal features and change trends of the sequence, making it more sensitive to sudden anomalies in the sequence. Subsequent reconstruction via VAE can more clearly identify the latent representations of anomalous sequence data.

The spatial feature extraction module also impacts the model’s detection and diagnosis performance. Removing the GAT network from the model causes RC−top−3 to decrease by 5.73%, indicating that the change trends of multiple ranging sequences are influenced by the target motion pattern. Ranging sequences from sensing UAVs that are spatially closer exhibit similarity. When the ranging value of one UAV mutates at a certain timestamp, the spatial feature embedding changes accordingly. This change propagates through the spatial feature interaction mechanism into the feature mappings of adjacent UAVs, ultimately manifesting as a significant increase in the reconstruction error of the ranging values, thereby affecting the model’s accurate localization and discrimination of anomalous patterns.

Additionally, removing the scoring module yields the most significant performance degradation, where the accuracy decreases by 9.78% compared with the proposed framework. This clearly highlights the scoring module’s critical role in both anomaly detection and diagnosis. If only the reconstruction error is considered, the error disparity across different node dimensions would be ignored, and the model lacks long-term and statistically stable error characteristics. As a result, it is difficult to reliably determine the reason for anomalies, i.e., identify the anomalous UAV node or the anomalous timestamps, thereby impairing overall performance.

5.6. Collaborative Localization Accuracy Simulation Under Secure Scheme

This section simulates the localization performance of the satellite-aided multi-UAV collaborative localization framework under the guarantee of the proposed scheme, characterized by the localization accuracy of the collaborative framework after processing anomalies based on the anomaly detection and diagnosis results. To evaluate the robustness of the proposed scheme, we additionally perform simulations under other two attack models: Stalking strategy [40], where all malicious UAVs follow a specific victim UAV and consistently launch data injection against it, and Global random attack [51], where malicious UAVs are uniformly distributed and randomly attack nearby UAVs to degrade localization performance on a global scale. To verify the importance of the anomaly detection and diagnosis scheme, comparison is made with the following two schemes, and the localization results of the satellite-aided multi-UAV collaborative localization framework under no anomaly (NA) conditions are used as a reference benchmark.

(1): No intervention strategy (NI): The satellite does not perform anomaly detection, diagnosis, or any processing on all received ranging data.
(2): Full intervention strategy (FI): The satellite does not perform anomaly cause diagnosis but only deletes all ranging data at anomalous timestamps based on the anomaly detection results and fills them using the K-nearest neighbors algorithm.

Figure 11 shows the time series prediction situation based on a single delayed ranging sequence under different schemes. Here, the proposed scheme and FI have the same effect, so only the proposed scheme is compared with NI and the reference benchmark NA. It can be seen that the proposed scheme can effectively control the prediction accuracy loss caused by anomalies under all three attack models, while NI leads to a significant prediction accuracy loss under different delays, increasing mean absolute error (MAE) and root mean squared error (RMSE) by at least 26.51% and 22.19%, respectively. which shows that the proposed scheme effectively controls the prediction accuracy loss caused by anomalies under different delays, whereas NI leads to significant accuracy degradation. Even under the most challenging Global random attack, the proposed scheme still improves performance by 17.23% in MAE and 13.14% in RMSE compared with NI. As observed, the proposed scheme demonstrates robustness across these attack models.

Figure 12 reports the statistical characteristics by presenting both the variance and the 95% confidence intervals (CIs) of the prediction error across Monte Carlo experiments. Specifically, the variance is computed over all per-sample prediction errors for each delay setting, while the 95% CIs are constructed according to the MAE of all samples. As observed, the error variance increases with delay due to reduced temporal correlation, yet remains bounded within 0.035. Meanwhile, the 95% CI bands are tightly concentrated and well-centered around the MAE curve. This indicates that the results of the proposed scheme are stable and not dominated by random fluctuations.

Figure 13 illustrates the prediction performances under varying attack intensities. Here we consider three intensity levels by setting the variance of attack signal distribution to

σ^{2} = 0.5

,

σ^{2} = 0.7

, and

σ^{2} = 1.0

, corresponding to progressively stronger attacks. As the attack intensity increases, the proposed method consistently maintains superior prediction accuracy. Even under the strongest attack setting with the variance of 1.0, the proposed scheme still achieves 12.58% improvement in MAE and 12.32% improvement in RMSE compared with the NI baseline. These results confirm that the proposed framework remains effective and robust under varying attack intensity.

Furthermore, this paper evaluates the impact of the K value in the anomaly handling method, i.e., the K-nearest neighbors algorithm, on the filling effect of anomalous values.

Figure 14 illustrates the influence of the K value in KNN on prediction accuracy under a 60 ms delay. As K increases from 1 to 15, the RMSE decreases, reaching an optimum of 2.9755 at

K = 15

. This trend indicates that small K values fail to leverage sufficient neighboring information for accurate missing value imputation, while moderate K helps smooth local anomalies and reduces prediction bias. However, when K exceeds 15, RMSE rises again, suggesting that an overly large K may cause underfitting by overemphasizing global features at the expense of local details.

Table 7 presents the localization accuracy of the proposed collaborative framework under various average delays across different schemes, with NA representing the no-anomaly benchmark. The NI scheme yields the largest localization error, demonstrating that failure to intervene against false data injection attacks significantly degrades localization performance. While the FI scheme shows improvement over NI, it remains inferior to our proposed approach, confirming that both anomaly detection and diagnosis are indispensable.Relying solely on detection without accurate diagnosis may mistakenly treat normal data as anomalous, compromising data authenticity and disrupting the temporal characteristics of the original ranging sequence, thereby increasing localization error. In contrast, our scheme—based on effective anomaly detection and diagnosis—minimizes error increase, nearly matches the NA benchmark, and achieves a 25.9% improvement in localization accuracy over NI across varying delays.

6. Conclusions

Building upon the satellite-aided multi-UAV collaborative localization framework, this paper aims to counter false localization information injection from malicious UAVs through anomaly detection, diagnosis, and elimination from ranging sequences. To exploit the spatio-temporal features of multidimensional ranging data, we construct a graph representing spatial relations between sensing UAVs and the emitter, where a GAT learns spatial embeddings and an attention-based LSTM captures temporal dependencies. A variational autoencoder further models anomalous distributions and produces reconstruction errors, which are fed into an optimized anomaly-scoring function to identify malicious UAVs. Based on the diagnosis results, the corresponding tampered values are corrected using a KNN-based imputation method before localization. Simulation results show that the proposed approach effectively captures spatio-temporal patterns, enhances anomaly diagnosis, and improves localization accuracy by 25.9%, ensuring reliable and secure collaborative localization in space–air networks. Future work will focus on more realistic attack models, dynamic UAV swarms, and validation on measured data and hardware-in-the-loop platforms in order to improve robustness and facilitate practical deployment in complex space–air environments.

Author Contributions

Conceptualization, J.P. and Q.O.; methodology, Q.O.; software, T.H.; validation, Z.L., W.L. and X.L.; formal analysis, X.L.; investigation, J.P.; resources, N.Y.; data curation, T.H.; writing—original draft preparation, J.P.; writing—review and editing, Q.O.; visualization, W.L.; supervision, Q.O.; project administration, Q.O.; funding acquisition, Q.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China under grant 62522103, and the Young Elite Scientists Sponsorship Program by CAST under grant 2022QNRC001.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, J.T.; Zhang, Z.H.; Fan, D.; Hou, C.Q.; Zhang, Y.; Hou, T.; Zou, X.N.; Zhao, J. Distributed Decision Making for Electromagnetic Radiation Source Localization Using Multi-Agent Deep Reinforcement Learning. Drones 2025, 9, 216. [Google Scholar] [CrossRef]
Kang, B.; Ye, N.; An, J. Achieving Positive Rate of Covert Communications Covered by Randomly Activated Overt Users. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2480–2495. [Google Scholar] [CrossRef]
Li, J.; Han, N.; Ye, J.; Pan, K.; Yang, K.; An, J. Instant Positioning by Single Satellite: Delay-Doppler Analysis Method Enhanced by Beam-Hopping. IEEE Trans. Veh. Technol. 2025, 74, 14418–14431. [Google Scholar] [CrossRef]
Behravan, A.; Yajnanarayana, V.; Keskin, M.F.; Chen, H.; Shrestha, D.; Abrudan, T.E.; Svensson, T.; Schindhelm, K.; Wolfgang, A.; Lindberg, S.; et al. Positioning and Sensing in 6G: Gaps, Challenges, and Opportunities. IEEE Veh. Technol. Mag. 2022, 18, 40–48. [Google Scholar] [CrossRef]
Cianca, E.; Nawaz, S.J.; Amatetti, C.; Rossi, T.; De Sanctis, M. LEO-based Network-Centric Localization in 6G: Challenges and Future Perspectives. Comput. Netw. 2024, 253, 110689. [Google Scholar] [CrossRef]
Ouyang, Q.; Ye, N.; An, J. On the Vulnerability of Mega-Constellation Networks Under Geographical Failure. IEEE Trans. Netw. 2025, 33, 2049–2062. [Google Scholar] [CrossRef]
Ye, N.; Miao, S.; Pan, J.; Xiang, Y.; Mumtaz, S. Dancing with Chains: Spaceborne Distributed Multi-User Detection Under Inter-Satellite Link Constraints. IEEE J. Sel. Top. Signal Process. 2025, 19, 430–446. [Google Scholar] [CrossRef]
Khelifi, M.; Butun, I. Swarm Unmanned Aerial Vehicles (SUAVs): A Comprehensive Analysis of Localization, Recent Aspects, and Future Trends. J. Sens. 2022, 2022, 8600674. [Google Scholar] [CrossRef]
Abdelsalam, N.; Al-Kuwari, S.; Erbad, A. Physical Layer Security in Satellite Communication: State-of-the-Art and Open Problems. IET Commun. 2025, 19, e12830. [Google Scholar] [CrossRef]
Zhang, Z.; An, J.; Ye, N.; Zhang, Z.; Niyato, D.; Yang, K. Multi-Attribute Wireless Interference Identification under Undersampling: A Multi-Domain Fusion Model using Domain-Specific Hybrid Sampling. IEEE Trans. Commun. 2025. Early Access. [Google Scholar] [CrossRef]
Gabsi, S.; Beroulle, V.; Kieffer, Y.; Dao, H.M.; Kortli, Y.; Hamdi, B. Survey: Vulnerability Analysis of Low-Cost ECC-Based RFID Protocols against Wireless and Side-Channel Attacks. Sensors 2021, 21, 5824. [Google Scholar] [CrossRef]
Battistello, A.; Bertoni, G.; Corrias, M.; Nava, L.; Rusconi, D.; Zoia, M.; Pierazzi, F.; Lanzi, A. Unveiling ECC Vulnerabilities: LSTM Networks for Operation Recognition in Side-Channel Attacks. arXiv 2025, arXiv:2502.17330. [Google Scholar] [CrossRef]
Nisioti, A.; Mylonas, A.; Yoo, P.D.; Katos, V. From Intrusion Detection to Attacker Attribution: A Comprehensive Survey of Unsupervised Methods. IEEE Commun. Surv. Tutor. 2018, 20, 3369–3388. [Google Scholar] [CrossRef]
Zhang, C.X.; Song, D.J.; Chen, Y.C.; Feng, X.Y.; Lumezanu, C.; Cheng, W.; Ni, J.C.; Zong, B.; Chen, H.F.; Chawla, N.V. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1409–1416. [Google Scholar]
Su, Y.; Zhao, Y.J.; Niu, C.H.; Liu, R.; Sun, W.; Pei, D. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
Tuor, A.; Kaplan, S.; Hutchinson, B.; Nichols, N.; Robinson, S. Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams. In Proceedings of the AAAI Workshops, San Francisco, CA, USA, 4–5 February 2017; pp. 224–231. [Google Scholar]
Singh, V.K.; Govindarasu, M. A Cyber-Physical Anomaly Detection for Wide-Area Protection Using Machine Learning. IEEE Trans. Smart Grid 2021, 12, 3514–3526. [Google Scholar] [CrossRef]
Fujimaki, R.; Yairi, T.; Machida, K. An Approach to Spacecraft Anomaly Detection Problem Using Kernel Feature Space. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, USA, 21–24 August 2005; pp. 401–410. [Google Scholar]
Lima, M.F.; Zarpelao, B.B.; Sampaio, L.D.H.; Rodrigues, J.J.P.C.; Abrao, T.; Proença, M.L. Anomaly Detection Using Baseline and K-means Clustering. In Proceedings of the 2010 18th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Dubrovnik, Croatia, 23–25 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 305–309. [Google Scholar]
Saeedi Emadi, H.; Mazinani, S.M. A Novel Anomaly Detection Algorithm Using DBSCAN and SVM in Wireless Sensor Networks. Wirel. Pers. Commun. 2018, 98, 2025–2035. [Google Scholar] [CrossRef]
Guo, Y.F.; Ji, T.X.; Wang, Q.L.; Yu, L.X.; Min, G.Y.; Li, P. Unsupervised Anomaly Detection in IoT Systems for Smart Cities. IEEE Trans. Netw. Sci. Eng. 2020, 7, 2231–2242. [Google Scholar] [CrossRef]
Odiathevar, M.; Seah, W.K.G.; Frean, M. A Bayesian Approach To Distributed Anomaly Detection In Edge AI Networks. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 3306–3320. [Google Scholar] [CrossRef]
Khreich, W.; Khosravifar, B.; Hamou-Lhadj, A.; Talhi, C. An Anomaly Detection System Based on Variable N-gram Features and One-Class SVM. Inf. Softw. Technol. 2017, 91, 186–197. [Google Scholar]
Garcia-Font, V.; Garrigues, C.; Rifà-Pous, H. A Comparative Study of Anomaly Detection Techniques for Smart City Wireless Sensor Networks. Sensors 2016, 16, 868. [Google Scholar] [CrossRef]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
Lin, S.Y.; Clark, R.; Birke, R.; Schönborn, S.; Trigoni, N.; Roberts, S. Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 4322–4326. [Google Scholar]
Javed, A.R.; Usman, M.; Rehman, S.U.; Khan, M.U.; Haghighi, M.S. Anomaly Detection in Automated Vehicles Using Multistage Attention-Based Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4291–4300. [Google Scholar] [CrossRef]
Cui, Y.W.; Ahmad, S.; Hawkins, J. Continuous Online Sequence Learning with an Unsupervised Neural Network Model. Neural Comput. 2016, 28, 2474–2504. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Zhu, D.L.; Ma, Y.C.; Liu, Y.L. Anomaly Detection with Deep Graph Autoencoders on Attributed Networks. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Zheng, Y.; Koh, H.Y.; Jin, M.; Chi, L.H.; Phan, K.T.; Pan, S.R.; Chen, Y.P.P.; Xiang, W. Correlation-Aware Spatial–Temporal Graph Learning for Multivariate Time-Series Anomaly Detection. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11802–11816. [Google Scholar] [CrossRef] [PubMed]
Ding, C.Y.; Sun, S.L.; Zhao, J. MST-GAT: A Multimodal Spatial–Temporal Graph Attention Network for Time Series Anomaly Detection. Inf. Fusion 2023, 89, 527–536. [Google Scholar] [CrossRef]
Zhang, C.L.; Zhou, T.; Wen, Q.S.; Sun, L. TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–22 October 2022; pp. 2497–2507. [Google Scholar]
Fan, J.; Wang, Z.H.; Wu, H.F.; Sun, D.F.; Wu, J.; Lu, X. An Adversarial Time–Frequency Reconstruction Network for Unsupervised Anomaly Detection. Neural Netw. 2023, 168, 44–56. [Google Scholar] [CrossRef]
Wang, H.Y.; Ouyang, Q.L.; Xi, W.; Xiang, Y.Y.; Ye, N. Dual Intelligence: Leveraging DRL With Smart Satellites to Counter Intelligent Jamming in Satellite Networks. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 1054–1067. [Google Scholar] [CrossRef]
He, Z.; Chen, P.; Li, X.; Wang, Y.; Yu, G.; Chen, C.; Li, X.; Zheng, Z. A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1705–1719. [Google Scholar] [CrossRef]
Wang, W.; Zhou, X.; Qiu, T.; Wang, L. MAGE: Multiperiodic Adaptive Graph Evolution Guided Anomaly Detection in Industrial IoT. IEEE Trans. Ind. Inform. 2025, 21, 6126–6136. [Google Scholar] [CrossRef]
Won, J.; Bertino, E. Robust Sensor Localization against Known Sensor Position Attacks. IEEE Trans. Mob. Comput. 2019, 18, 2954–2967. [Google Scholar] [CrossRef]
Mukhopadhyay, B.; Srirangarajan, S.; Kar, S. RSS-Based Localization in the Presence of Malicious Nodes in Sensor Networks. IEEE Trans. Instrum. Meas. 2021, 70, 5503716. [Google Scholar] [CrossRef]
Fang, Z.; Han, B.; Schotten, H.D. Trustworthy UAV Cooperative Localization: Information Analysis of Performance and Security. IEEE Trans. Veh. Technol. 2025, 74, 12997–13012. [Google Scholar] [CrossRef]
Wood, A.D.; Stankovic, J.A.; Zhou, G. DEEJAM: Defeating Energy-Efficient Jamming in IEEE 802.15.4-based Wireless Networks. In Proceedings of the 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, San Diego, CA, USA, 18–21 June 2007; pp. 60–69. [Google Scholar]
Kim, W.C.; Kim, S.Y.; Lim, H. Malicious Data Frame Injection Attack Without Seizing Association in IEEE 802.11 Wireless LANs. IEEE Access 2021, 9, 16649–16660. [Google Scholar] [CrossRef]
Wang, L.; Chen, L.; Li, B.; Liu, Z.; Li, Z.; Lu, Z. Development Status and Challenges of Anti-Spoofing Technology of GNSS/INS Integrated Navigation. Front. Phys. 2024, 12, 1425084. [Google Scholar] [CrossRef]
Chen, Z.; Cooklev, T.; Chen, C.; Pomalaza-Ráez, C. Modeling Primary User Emulation Attacks and Defenses in Cognitive Radio Networks. In Proceedings of the IEEE 28th International Performance, Computing, and Communications Conference (IPCCC), Scottsdale, AZ, USA, 14–16 December 2009; pp. 208–215. [Google Scholar]
Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised Real-Time Anomaly Detection for Streaming Data. Neurocomputing 2017, 262, 134–147. [Google Scholar] [CrossRef]
Nicolau, M.; McDermott, J. Learning Neural Representations for Network Anomaly Detection. IEEE Trans. Cybern. 2018, 49, 3074–3087. [Google Scholar] [CrossRef]
Park, D.H.; Hoshi, Y.; Kemp, C.C. A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder. IEEE Robot. Autom. Lett. 2018, 3, 1544–1551. [Google Scholar] [CrossRef]
Zhao, H.; Wang, Y.J.; Duan, J.Y.; Huang, C.R.; Cao, D.F.; Tong, Y.H.; Xu, B.X.; Bai, J.; Tong, J.; Zhang, Q. Multivariate Time-Series Anomaly Detection via Graph Attention Network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Virtual, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 841–850. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Garg, A.; Zhang, W.Y.; Samaran, J.; Savitha, R.; Foo, C.-S. An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2508–2517. [Google Scholar] [CrossRef]
Han, B.; Krummacker, D.; Zhou, Q.; Schotten, H.D. Trust-Awareness to Secure Swarm Intelligence from Data Injection Attack. In Proceedings of the IEEE International Communications Conference (ICC), Rome, Italy, 28 May–1 June 2023; pp. 1406–1412. [Google Scholar]

Figure 1. Satellite-aided multi-UAV collaborative localization scenario.

Figure 2. Classification of existing methods for multi-UAV ranging sequence anomaly detection.

Figure 3. Satellite-aided multi-UAV secure localization system model, highlighting the time-ordered information flow, where the thin black arrows denote UAV trajectories, the thin red arrows indicate UAV-to-satellite communications, and the thick red and black arrows represent the eavesdropping and spoofing paths of malicious UAVs.

Figure 4. Overview of the proposed collaborative secure localization framework with spatio-temporal anomaly detection and diagnosis.

Figure 5. Graph attention network layer for spatial feature extraction with self-attention and neighbor attention aggregation.

Figure 6. Multi-head attention in GAT: projection, scaled dot-product attention, and head concatenation for richer graph-structure modeling.

Figure 7. LSTM-attention plus VAE temporal feature extraction for unsupervised anomaly detection.

Figure 8. Exampleof generated dataset for anomaly dsetection. Dark red indicates anomalous data. Dark blue indicates normal data. Light red indicates data with potential anomaly risk.

Figure 9. t-SNE visualization of normal and anomalous samples in the test set.

Figure 10. Module importance evaluated via ablation experiments. Performance of each model variant is measured by Accuracy, F1-score, and RC-top-3.

Figure 11. Prediction accuracy loss of ranging values under different attack models and delays.

Figure 12. Statistical characterization of the prediction error of ranging values. (a) Variance of the prediction error of ranging values. (b) 95% CI of the prediction error of ranging values.

Figure 13. Prediction accuracy loss of ranging values under different attack intensities.

Figure 14. Prediction RMSE of ranging values under different K values of KNN algorithm.

Table 1. Main Symbols and Their Definitions.

Symbol	Definition
$D$	Ranging sequence received by the satellite
$D_{i}$	Ranging sequence sent by UAV $S_{i}$
$d_{t}$	n-dimensional ranging value at time t
$h_{g}$	Spatial feature embedding vector
$x_{t}$	Embedding vector integrating spatio-temporal features
$z_{t}$	Low-dimensional latent representation from VAE
$E_{i}^{(t)}$	Reconstruction error at UAV dimension
$a_{i}^{(t)}$	Anomaly score at UAV dimension
$γ$	Coordination factor
$W_{s}$ , $W_{d}$	Static window and dynamic window
${\hat{μ}}_{s}^{i}$ , ${\hat{σ}}_{s}^{i}$	Mean and standard deviation of training errors within the static window
${\hat{μ}}_{d}^{(t)}$ , ${\hat{σ}}_{d}^{(t)}$	Mean and standard deviation of reconstruction errors within the dynamic window
${AS}_{t}$	Total anomaly score at time t
$ξ$ , $l_{t}$	Anomaly threshold and anomaly label at time t

Table 2. Simulation Parameters for Dataset Generation.

Parameter for Dataset Generation	Specification
Satellite Altitude	300 km
Sensing UAV Distribution Area	500 m × 500 m × 100 m
Emitter Movement Area	60 m × 60 m × 60 m
Emitter Velocity Range	10–20 m/s
Number of Malicious UAVs	3
Anomaly Ratio	≈11.2%
Number of Anomalous Timestamps	268

Table 3. Hyperparameter settings of the proposed anomaly detection model.

Experimental Parameter	Parameter Value
Training Epochs	240
Optimizer	Adam
Learning Rate	0.001
Batch Size	64
$L_{\sum^{'}}$ Hyperparameter $β$	1
GAT Layer Dropout	0.1
LSTM Hidden Layer Dimension	128
VAE Latent Space Representation Dimension	64

Table 4. Comparison of anomaly detection model complexity.

Detection Model	Training Complexity	Detection Complexity	Parameter Count
Proposed Model	$O (k_{a} n L + L H^{2})$	$O (k_{a} n L)$	$k_{a} n + H^{2}$
PCA	$O (n L^{2} + L^{3})$	$O (n L^{2})$	——
MTAD-GAT	$O (2 k_{a} n^{2} L)$	$O (k_{a} n^{2} L)$	$2 k_{a} n + H^{2}$

Table 5. Simulation results of anomaly detection performance.

Model	$γ$ Value	Precision	Recall	F1-Score
Proposed Model	0	0.9285	0.9102	0.9193
AE		0.7613	0.7094	0.7344
LSTM-AD		0.8604	0.7116	0.7790
VAE-LSTM		0.8962	0.7940	0.8420
MSCRED		0.8461	0.8582	0.8521
MTAD-GAT		0.8508	0.8329	0.8417
Proposed Model	0.25	0.9615	0.9259	0.9434
AE		0.7924	0.7364	0.7634
LSTM-AD		0.8815	0.7376	0.8032
VAE-LSTM		0.9513	0.8179	0.8796
MSCRED		0.9013	0.8930	0.8971
MTAD-GAT		0.8816	0.8351	0.8577
Proposed Model	0.5	0.9328	0.9163	0.9245
AE		0.7951	0.7214	0.7565
LSTM-AD		0.8903	0.7512	0.8149
VAE-LSTM		0.9103	0.8024	0.8530
MSCRED		0.8632	0.8901	0.8764
MTAD-GAT		0.8807	0.8257	0.8523
Proposed Model	0.75	0.8954	0.9001	0.8977
AE		0.7539	0.6829	0.7166
LSTM-AD		0.8577	0.6936	0.7670
VAE-LSTM		0.8411	0.7806	0.8097
MSCRED		0.8685	0.8472	0.8577
MTAD-GAT		0.8224	0.8394	0.8308
Proposed Model	1	0.9019	0.8821	0.8918
AE		0.7166	0.6319	0.6716
LSTM-AD		0.8612	0.6502	0.7410
VAE-LSTM		0.8569	0.7499	0.7998
MSCRED		0.8515	0.8167	0.8337
MTAD-GAT		0.8129	0.8300	0.8214

Table 6. Simulation results of anomaly diagnosis performance.

Model	$HitRate @ 100 %$	RC-Top-3
Proposed Model	0.7419	0.9362
AE	0.5934	0.7160
LSTM-AD	0.6138	0.8331
VAE-LSTM	0.6493	0.8607
MSCRED	0.7145	0.9173
MTAD-GAT	0.6411	0.8754

Table 7. Localization error of collaborative framework under different schemes (Unit: m).

Scheme	10 ms	15 ms	20 ms	25 ms	30 ms	35 ms	40 ms	45 ms	50 ms	55 ms	60 ms
Proposed Scheme	4.27	4.32	4.51	4.57	4.62	4.83	4.98	5.03	5.21	5.44	5.58
NI	5.64	5.83	5.92	5.95	6.02	6.25	6.43	6.79	6.88	7.10	7.32
FI	4.71	4.78	5.03	5.04	5.08	5.17	5.54	5.60	6.19	6.36	6.65
NA	3.73	3.78	3.92	4.12	4.19	4.23	4.36	4.70	4.82	4.91	5.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, J.; Ouyang, Q.; Lin, Z.; Hao, T.; Li, W.; Li, X.; Ye, N. Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis. Drones 2026, 10, 53. https://doi.org/10.3390/drones10010053

AMA Style

Pan J, Ouyang Q, Lin Z, Hao T, Li W, Li X, Ye N. Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis. Drones. 2026; 10(1):53. https://doi.org/10.3390/drones10010053

Chicago/Turabian Style

Pan, Jianxiong, Qiaolin Ouyang, Zhenmin Lin, Tucheng Hao, Wenyue Li, Xiangming Li, and Neng Ye. 2026. "Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis" Drones 10, no. 1: 53. https://doi.org/10.3390/drones10010053

APA Style

Pan, J., Ouyang, Q., Lin, Z., Hao, T., Li, W., Li, X., & Ye, N. (2026). Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis. Drones, 10(1), 53. https://doi.org/10.3390/drones10010053

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Satellite-Aided Multi-UAV Secure Collaborative Localization via Spatio-Temporal Anomaly Detection and Diagnosis

Highlights

Abstract

1. Introduction

2. Related Works

2.1. Traditional Time-Series Anomaly Detection

2.2. Deep Learning for Temporal Anomaly Modeling

2.3. Graph-Based Spatial Learning and Spatio-Temporal Anomaly Detection

2.4. Robust Cooperative Localization Under Malicious Nodes

2.5. Open Issues and Remaining Gaps

3. System Model

3.1. Multi-UAV Collaborative Secure Localization System

3.2. Detection and Diagnosis Process

3.3. Problem Formulation for Malicious UAVs Identification

4. Spatio-Temporal Feature Analysis for Anomaly Detection and Diagnosis

4.1. Collaborative Secure Localization Model Architecture

4.2. Multidimensional Ranging Sequence Preprocessing

4.3. Spatial-Temporal Anomaly Detection Based on GAT and VAE

4.3.1. Spatial Feature Extraction Module

4.3.2. Temporal Feature Extraction Module

4.4. UAV-Wise Anomaly Scoring Mechanism

4.5. Anomaly Detection, Diagnosis, and Elimination

5. Simulation Results and Analysis

5.1. Anomaly Detection Dataset Generation

5.2. Baseline Models and Model Parameter Settings

5.3. Anomaly Detection Model Complexity Analysis

5.4. Anomaly Detection Performance Simulation

5.5. Anomaly Diagnosis Performance Simulation

5.6. Collaborative Localization Accuracy Simulation Under Secure Scheme

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI