1. Introduction
The rapid deployment of fifth-generation (5G) and beyond wireless networks has positioned unmanned aerial vehicles (UAVs) as a transformative platform for delivering communication and computation services to ground mobile devices. UAVs are increasingly utilized in diverse scenarios, including sports stadiums, outdoor events, traffic hotspots, and remote regions, where they function as aerial base stations (BSs) or edge servers [
1,
2,
3]. Leveraging their agility, mobility, and favorable line-of-sight (LoS) propagation, UAV-enabled networks facilitate rapid-response wireless communications, distributed data collection, artificial intelligence (AI) model training, and dynamic coverage enhancement [
1]. These capabilities have spurred significant research into UAV-assisted computation and communication for next-generation wireless networks [
3].
Despite their advantages, UAVs face challenges in processing and training large-scale datasets due to inherent limitations in energy, storage, and computational capacity [
1,
2]. Moreover, transmitting raw data from numerous ground devices to UAV servers is often impractical due to privacy concerns and constrained communication resources [
4]. Federated learning (FL) [
5,
6] offers a privacy-preserving solution by enabling devices to train AI models locally without sharing raw data. In FL, devices perform local model updates using gradient descent optimization [
7] and share only model parameters with a central server for aggregation, thereby reducing data transmission and enhancing privacy. Integrating FL into UAV-assisted networks enables distributed AI tasks without reliance on centralized BSs. Ground devices train models locally and upload model parameters to an FL-enabled UAV server, which aggregates them to update a global model. As shown in
Figure 1, the UAV broadcasts the updated global model back to the devices for further refinement. This iterative process continues until the desired learning accuracy is achieved, minimizing network congestion and preserving data privacy. Compared to traditional cloud-centric frameworks, FL enhances the efficiency of AI model training in UAV-assisted wireless networks.
However, the distributed nature of FL renders it susceptible to poisoning attacks, where adversaries compromise local training data or model parameters [
8]. Poisoning attacks are categorized as non-targeted, which degrade overall model performance [
9], or targeted, which manipulate predictions for specific inputs [
10]. A key vulnerability in FL systems arises from the potential untrustworthiness of participating clients, some of which may be malicious or compromised. For instance, adversaries may use projected gradient descent to craft local models that mislead the global model’s convergence [
11]. Additionally, surrogate models mimicking target model behavior pose significant security risks [
12]. These threats underscore the need for robust detection and defense mechanisms against poisoning attacks in FL environments.
The widely adopted Federated Averaging (FedAvg) algorithm is particularly vulnerable, as even rudimentary attack strategies can disrupt its convergence [
13]. Existing defenses include robust aggregation algorithms [
14,
15] and anomaly detection techniques [
16]. Robust aggregation methods employ weighting or filtering to mitigate anomalous updates but falter against numerous malicious clients. Anomaly detection approaches analyze update patterns to identify malicious clients, yet they often suffer from high false-positive and negative rates due to the complexity of attack strategies [
17]. Gradient-based filtering methods struggle with coordinated attacks, while geometric median or gradient analysis approaches offer partial mitigation but compromise training efficiency and stability against sophisticated attacks.
Despite notable progress in designing secure federated learning frameworks, existing approaches exhibit critical limitations, particularly in UAV-assisted networks where resource constraints and data heterogeneity are pronounced. Many robust aggregation strategies, while effective against certain classes of attack, often suffer substantial performance degradation when confronted with highly coordinated or stealthy poisoning strategies or when the proportion of malicious clients increases. Anomaly detection techniques also tend to exhibit limited reliability under non-IID data distributions, producing high rates of false positives and negatives that undermine both the efficiency and trustworthiness of the learning process. Furthermore, the majority of prior proposals are challenged by the computational limitations of UAV hardware, as they frequently involve heavyweight model architectures or multiple training iterations, resulting in excessive local computation or communication overhead. These issues are compounded in practical deployments where bandwidth is severely limited, data is unlabeled, and sophisticated attackers can dynamically adapt to existing defense mechanisms. Model compression techniques, such as pruning and quantization, can reduce communication overhead in FL. However, these methods are ineffective against sophisticated model poisoning attacks, which manipulate the direction and magnitude of model parameter updates in a targeted manner. Pruning and quantization cannot reliably identify the affected neurons, and the information loss introduced by these techniques may even exacerbate model vulnerability, leading to a degradation in the main task accuracy.
To address these challenges, we propose FedULite, a federated learning framework specifically designed for UAV-assisted networks facing non-IID data and poisoning attacks. FedULite introduces a lightweight and robust local training pipeline that leverages unsupervised representation learning on unlabeled data, making it well-suited for resource-constrained UAV platforms. To further defend against model poisoning attacks, FedULite employs a robust server-side aggregation mechanism that adaptively filters and aggregates client updates based on update consistency and cosine similarity thresholds. Although techniques such as contrastive learning and cosine similarity-based filtering have been independently investigated in prior research, the core innovation of FedULite lies in its synergistic defense mechanism. At the client side, we employ lightweight unsupervised contrastive learning as the first line of defense. This approach not only addresses the challenges of unlabeled and non-IID data prevalent in UAV scenarios but also significantly enhances the robustness of local model feature representations before model updates are uploaded, thereby mitigating the impact of potential poisoning attacks at the source. On the server side, robust adaptive aggregation serves as the second line of defense. The local training in the first stage ensures that updates from benign clients exhibit greater consistency, substantially improving the accuracy of server-side cosine similarity-based filtering. This enables more effective identification of malicious models that deviate from the dominant update patterns. The two-stage design collectively forms a lightweight yet highly efficient defense framework, specifically tailored for resource-constrained environments such as UAVs. The holistic integration and synergistic effect of this approach represent the primary distinction and advantage over frameworks that rely solely on individual defense strategies.
The main contributions of this work are summarized as follows.
We introduce a computationally efficient FL framework optimized for UAV-assisted networks, leveraging unsupervised contrastive learning and lightweight architectures to enable robust representation learning on resource-constrained clients, addressing the computational and storage limitations of UAVs.
We develop a robust, adaptive aggregation method at the server, which combines cosine similarity-based update filtering and dimension-wise aggregation with adaptive learning rates, effectively countering both traditional and adaptive model poisoning strategies, including stealthy and coordinated attacks.
We provide extensive experimental validation, demonstrating that FedULite not only significantly improves robustness and efficiency in UAV-assisted federated learning but also achieves reliable convergence and strong resistance to adversarial disruptions under diverse real-world conditions.
The remainder of this paper is organized as follows.
Section 2 reviews related work on federated learning assisted UAV networks and backdoor attacks, as well as defense mechanisms in federated learning.
Section 3 presents the proposed FedULite method in detail.
Section 4 evaluates and analyzes the experimental results. Finally,
Section 5 concludes the paper and discusses future research directions.
3. Threat Model
Attack model: We adopt the attack model established in prior works [
45,
46,
57]. Specifically, the adversary controls one or more malicious clients, which may include fabricated clients injected into the system. The adversary does not compromise the server. During each iteration of the FL training process, malicious clients can send arbitrary local model updates to the server.
Attacker’s goal: Poisoning attacks in FL are generally categorized into untargeted poisoning attacks, which degrade overall model performance, and targeted poisoning attacks, which induce erroneous predictions on specific target samples while maintaining high overall model accuracy to evade detection. This study focuses on targeted poisoning attacks due to their stealthier nature.
Attacker’s capability: The adversary possesses full control over the training process of compromised clients, enabling direct manipulation of their model parameters in each attack cycle. The adversary has access to the global model updates in each training round and knowledge of the local data and model parameters of controlled clients. However, the adversary cannot access the server’s aggregation algorithm, defense mechanisms, or the data and training strategies of benign clients.
Defender’s goal: Existing poisoning defense methods are predominantly server-side, assuming a trusted server, but their performance degrades sharply as the proportion of malicious clients increases. This work aims to design a lightweight, localized poisoning defense method to achieve the following objectives: (1) Utility: In the absence of attacks, the method should not compromise the classification accuracy of the global model, achieving performance comparable to the widely used FedAvg algorithm in non-adversarial settings. (2) Robustness: In the presence of strong targeted poisoning attacks, the global model trained with our method should not predict attacker-specified labels for attacker-chosen target test samples. (3) Lightweight: Given the resource-constrained nature of clients in UAV-assisted FL systems, the method should impose minimal additional computational workload.
4. Method
4.1. Overview of FedULite
Figure 2 presents FedULite, a federated learning framework tailored for unmanned aerial vehicle (UAV)-assisted scenarios, addressing the challenges of non-independent and identically distributed (non-IID) data and poisoning attacks. FedULite leverages unlabeled data to train a robust global encoder
, enabling effective representation learning in resource-constrained and dynamic network environments. To accommodate the computational limitations of UAV clients, the framework employs lightweight local training with optimized architectures and minimal computational overhead. At the server, a robust aggregation mechanism mitigates model poisoning attacks, such as backdoor and model replacement attacks [
23,
24,
25], by prioritizing consistent client updates. A streamlined bidirectional communication protocol facilitates the exchange of online encoders
and predictors
between clients and the server, with the server distributing aggregated models
and
. The FedULite framework operates through three core stages:
Local Training: Each client performs unsupervised representation learning with a lightweight contrastive loss, training an online encoder , predictor , and target encoder , with anomaly detection to counter data poisoning.
Model Aggregation: The server aggregates client models using a robust strategy that evaluates update consistency, producing updated global models and .
Model Update: The server distributes and , which clients adopt to update their local models.
Algorithm 1 encapsulates FedULite, detailing its lightweight training, robust aggregation, and update mechanisms.
4.2. Local Robust Training
In FedULite, local training employs a dual-network architecture comprising an online network, consisting of an online encoder with parameters and a predictor , and a target network, represented by a target encoder with parameters . This architecture facilitates unsupervised representation learning, optimized for the resource-constrained environment of UAV clients while ensuring robustness against data poisoning attacks, such as label flipping or adversarial samples.
To mitigate the impact of potentially malicious encoders in federated learning, FedULite employs a strategic initialization and update policy for local models. In the first training round, benign clients initialize both their online encoder
and target encoder
with the global encoder
, ensuring a clean starting point free from malicious influence. In subsequent rounds, the online encoder
and predictor
are set to the global encoder
and predictor
, respectively, while the target encoder
adopts the parameters of the previous round’s local online encoder
. This approach leverages the historical, presumably benign, local encoder as the target to stabilize training and counteract potential malicious updates aggregated into the global encoder. By using the previous round’s local encoder as a reference, FedULite disperses the influence of malicious clusters, enhancing the robustness of the global model against backdoor attacks.
Algorithm 1: Algorithm of FedULite process |
Data: Number of clients N, communication rounds T, local epochs Result: Global encoder , global predictor ![Drones 09 00528 i001]() |
Training commences with the generation of two augmentations, t and , from an input image x, employing computationally efficient transformations, such as random cropping and horizontal flipping, to minimize preprocessing overhead. Both the online and target encoders are implemented as lightweight convolutional neural networks, designed to reduce model complexity while preserving expressive capacity. The predictor is a compact multilayer perceptron (MLP), further reducing computational demands.
The online network is optimized using a contrastive loss that aligns the normalized outputs of the online and target networks:
where
represents the output of the online network, and
serves as the regression target provided by the target network. The parameters of the target network,
, are updated via an exponential moving average (EMA) of the online encoder’s parameters
:
where the decay rate
is dynamically adjusted based on training dynamics, such as loss variance or communication rounds, to balance stability and adaptability in non-IID settings. To enhance computational efficiency, each client performs a single training iteration per communication round on a small batch of samples, significantly reducing overhead compared to multi-iteration training.
To leverage labeled data available at each client and align with the primary objectives of federated learning, FedULite incorporates supervised learning when labels are present. The online encoder
is optimized with a cross-entropy loss over labeled samples:
where
denotes the local labeled dataset of client
k, and
represents the parameters of
at round
t. This supervised objective complements the unsupervised contrastive loss, enhancing the encoder’s ability to capture task-specific features while maintaining robustness in non-IID settings.
The communication protocol is designed for efficiency: clients upload only the online encoder and predictor , as encapsulates the most recent data representations, while the target encoder remains local to provide stable regression targets. This approach minimizes uplink communication overhead, a critical consideration for bandwidth-constrained UAV networks.
4.3. Robust Adaptive Aggregation
To enhance the robustness of the global model against poisoning attacks, FedULite employs a sophisticated aggregation strategy at the server, supplanting traditional weighted averaging. It is crucial to note that our robust adaptive aggregation strategy fundamentally relies on the server’s ability to access and inspect the individual model update,
, from each client prior to aggregation. This access is essential for computing the cosine similarities used in the filtering stage (Equation (
4)) and for assessing the sign consistency of each parameter dimension (Equation (
5)). We therefore must clearly state that this requirement makes FedULite, in its current form, incompatible with standard secure aggregation protocols that are designed to provide server-side privacy by hiding these very updates.
For each client k, model updates are computed as for the online encoder and for the predictor, where and represent the global parameters from the previous communication round. The aggregation process leverages a two-stage mechanism to evaluate update consistency, effectively neutralizing malicious contributions while preserving model convergence in non-IID settings.
In the first stage, updates are pre-filtered based on their alignment with the mean update across clients:
where
is a cosine similarity threshold. This filtering excludes updates that significantly deviate from the majority, thereby mitigating stealthy attacks, such as cosine-constrained poisoning attempts [
29]. In the second stage, the server assesses the consistency of updates along each parameter dimension
i by computing the sum of update signs:
A dimension-specific learning rate
is then assigned:
where
is the base learning rate, and
is a threshold dynamically adjusted based on the number of participating clients to accommodate the variable client participation inherent in UAV networks. The global encoder and predictor parameters are updated as follows:
where
, and ⊙ denotes element-wise multiplication. The vector
represents the data volume of client
k, ensuring weighted contributions proportional to local dataset sizes. This mechanism suppresses the influence of inconsistent updates, effectively mitigating poisoning attacks while maintaining robust convergence.
Upon aggregation, the server distributes the updated global models and to all clients, who synchronize their local models by setting and . This streamlined update process ensures model consistency across clients, enhancing robustness in heterogeneous, non-IID data environments characteristic of UAV-assisted FL.
6. Conclusions and Future Work
In this paper, we present FedULite, a lightweight and robust federated learning framework tailored for UAV-assisted wireless networks. Specifically, FedULite enables clients to perform efficient unsupervised local representation learning. To enhance model robustness, FedULite incorporates a two-stage adaptive aggregation strategy at the server side, leveraging cosine similarity-based update filtering and dimension-wise adaptive learning rates to effectively suppress both data and model poisoning attacks. Compared with conventional defense approaches, FedULite achieves superior robustness and efficiency in resource-constrained, adversarial UAV environments. Extensive experiments demonstrate that FedULite ensures reliable learning performance and strong resistance to advanced adversarial threats. Looking ahead, we plan to extend the research on FedULite in several key directions. First, regarding deployment in more realistic UAV environments, our framework already exhibits inherent adaptability. For instance, Intermittent Connectivity and Mobility: The federated learning paradigm of FedULite is designed to tolerate dynamic changes in participant availability. Client mobility or temporary disconnections are treated as temporary non-participation in a given aggregation round, and our robust aggregation algorithm remains insensitive to variations in the number of participating clients. Energy Constraints: The lightweight design of our framework is a core strength. Local training requires only a single iteration, significantly reducing energy consumption for UAVs or ground devices, thereby ensuring feasibility in energy-constrained environments. Second, the assumption of a fully trusted server represents a limitation of the current work, and we envision a clear path toward extending FedULite to decentralized or semi-trusted settings. FedULite can be integrated with blockchain technology. Model updates and aggregation rules can be deployed as smart contracts on the blockchain, ensuring transparency, immutability, and auditability of the process. This enables reliable collaboration in semi-trusted environments. In addition, we will explore effective integration schemes of FedULite with privacy-enhancing technologies such as differential privacy and secure aggregation to build a comprehensive framework that is both secure, robust, and privacy-preserving. Finally, we provide a strict theoretical convergence guarantee for our robust aggregation algorithm, which is a key step in pushing this method from empirical validity to theoretical reliability.