We systematically evaluated the proposed algorithm from five perspectives: accuracy, real-time performance, energy consumption, communication, and safety. For accuracy, we employed the UAV123 aerial video dataset to validate the algorithm’s performance under dynamic conditions, including scale variation, viewpoint changes, and motion blur. In terms of real-time performance, we recorded the per-frame processing time on an embedded platform, and the experimental results demonstrate that the method met the real-time requirements for UAV vision tasks. Furthermore, based on the high-fidelity simulation environment provided by AirSim (version 1.8.1), we monitored energy consumption during flight, measured the end-to-end transmission delay and data volume for image transmission, and further evaluated the computational overhead of the algorithm.
Building on this evaluation framework, this section assesses the effectiveness and efficiency of our federated learning parameter integrity verification scheme through simulation experiments. Unlike our approach, most existing methods probabilistically verify data integrity by randomly selecting a subset of data blocks from each parameter message and generating integrity proofs for each sampled block using either RSA-based homomorphic functions or BLS signatures, which are then transmitted to the server for verification. We compared our scheme with two representative approaches—RSA-based [
43] and BLS-based [
44] schemes—to evaluate its performance. Effectiveness is defined as the accuracy in detecting corruption in model parameter messages, with a higher detection probability indicating better performance, while efficiency was measured by computational overhead, where a lower computation time indicates better efficiency.
6.2.1. Experimental Set-Up
We evaluated the proposed VHEFL scheme in a large-scale simulated UAV edge computing environment using AirSim and Gazebo (version 11.1.0), which were integrated with ROS 2. The entire testbed was deployed on a high-performance server cluster (8× NVIDIA V100 GPUs, 256 GB RAM), where N virtual UAV nodes were instantiated as containerized edge computing instances, each of which was constrained to 4 vCPUs and 4 GB of RAM to emulate the resource limitations of typical onboard platforms, such as Raspberry Pi 4.
In our evaluations, we also examined the computational latency of Schnorr aggregate signature generation and verification in this simulated UAV edge environment. Each UAV node generated an individual signature for its model update, with an average generation latency of 12.8 ms, based on representative values for elliptic curve operations on ARM Cortex-A72 class devices designed by ARM Limited, Cambridge, UK. On the server side, we measured the aggregate verification latency for 50 concurrent UAVs, which was 9.4 ms on the same high-performance server cluster. This reduced the verification cost from to , with an amortized per-UAV verification overhead of 0.19 ms, bringing the total per-UAV cryptographic overhead to 13.0 ms.
A key design consideration in our server-side implementation is the separation of out-of-enclave computation. Specifically, the aggregation of model updates is performed entirely in the REE outside the SGX enclave, while the enclave is used exclusively for secure validation of the aggregated model’s integrity and authenticity, namely signature verification. This approach reduces the memory footprint inside the enclave and mitigates the risk of enclave page cache overflow. By confining trusted execution to critical verification tasks only, the system ensures robust protection of sensitive data while avoiding performance degradation caused by the enclave’s limited memory capacity.
We explicitly modeled UAV-to-ground communication links based on typical broadband wireless connections (Wi-Fi bridge) that are commonly deployed on commercial UAV platforms. The bandwidth was configured to be in the range of 1–10 Mbps, with round-trip times between 5 and 20 ms and packet loss rates from 0 to 5.4%. Each transmission payload was determined by the actual ciphertext and signature sizes of VHEFL. Under these constraints, we measured the per-round uplink and downlink latency and system throughput.
The security of CKKS parameters is primarily determined by the polynomial modulus degree (
N) and the total bit length of the coefficient modulus (
). We selected the parameter set with
N = 16,384 and
, as summarized in
Table 2, which achieved a 128-bit security level and aligned with the NIST’s recommendations for long-term security. Compared with
, this configuration offered stronger resistance to quantum attacks, while compared with
N = 32,768, it reduced the communication overhead by approximately
and the computational overhead by
, making it more suitable for the bandwidth and computational constraints of UAV edge networks.
Precision loss in CKKS primarily stems from scaling rounding and noise accumulation. With
N = 16,384, the MAE decreased from
to
, a 3.5× improvement over
, which restored the model accuracy from 98.95% to 99.22%. Setting the depth to three degraded the precision by an order of magnitude and reduced the accuracy by 1.38 percentage points, justifying our choice of two for the depth. Although
N = 32,768 offered higher precision, it increased the computational and communication overhead by 131% and 300%, respectively, with diminishing returns, making it unsuitable for resource-constrained UAV edge networks. These results are summarized in
Table 3.
For the RSA-based scheme, a 512-bit prime was used to generate the integrity proof, with a prime generation determinism of . Both the BLS-based scheme and our scheme used SHA-256 for integrity proof generation and an - elliptic curve, which provided security similar to 3072-bit RSA and DSA but with 256-bit points. The curve is defined by and . To simulate corrupted model parameter messages, 1% of the data blocks (each 16 KB) was altered to random values based on the local model size.
6.2.3. Energy Efficiency
In this experiment, the energy efficiency is denoted by
, which is defined as the amount of effective information processed per unit energy
where
represents the effective information completed under a given system scale and
denotes the corresponding total energy consumption. A larger
indicates higher energy utilization efficiency. The horizontal axis
denotes the system scale parameter and is used to characterize the expansion of task scale as its value increases. As shown in
Figure 3, with the increase in
, the
of the proposed scheme gradually decreased from approximately
and stabilized around
, indicating that the energy overhead grew slightly faster than the effective information gain as the system scale increased. In contrast, the
values of HE + TEE, HE + Aggsign, and TEE + Aggsign exhibited an increasing trend with respect to
. Nevertheless, throughout the entire tested range, the proposed scheme consistently achieved the highest
values. These results demonstrate that the proposed scheme maintained the overall best energy utilization efficiency across different system scales.
6.2.4. Experimental Analysis of Program Availability
Figure 4 illustrates the accuracy performance of the four schemes across different aggregation rounds. As the number of aggregation rounds increased from 20 to 140, the proposed scheme consistently achieved the highest accuracy and exhibited steady improvement, rising from approximately 95.2% to about 99.2%. It not only converged faster but also attained significantly higher final accuracy than the other schemes. In contrast, HE + TEE and HE + Aggsign showed gradual improvements but remained noticeably below the proposed scheme, while TEE + Aggsign remained nearly unchanged and exhibited a limited convergence capability. These results demonstrate that the proposed scheme can more effectively integrate model updates while maintaining security guarantees, thereby accelerating convergence and achieving superior overall performance and robustness.
As shown in
Figure 5, the computational overhead of the RSA-based and BLS-based schemes increased rapidly with the sampling size. When the sample size reached 400, their accuracy approached that of our scheme, but the computational overhead was too high for efficient inspection. In contrast, the time consumption of our scheme remained stable as the sample size increased, highlighting its better usability. A detailed analysis of its efficiency will be provided in the next section.
Figure 6 illustrates the accuracy performance of the four schemes under different malicious client ratios
(%), evaluating their robustness against adversarial updates. As
increased from
to
, the accuracy of HE + Aggsign and TEE + Aggsign declined significantly, with TEE + Aggsign experiencing the most severe degradation, eventually dropping to around
, which indicates high sensitivity to malicious updates. HE + Aggsign also showed a continuous downward trend, suggesting limited robustness under high attack intensities. In contrast, HE + TEE exhibited only a slight decrease in accuracy, demonstrating moderate stability. Notably, the proposed scheme consistently achieved the highest accuracy across the entire range of attack ratios, with only a minimal performance drop. This indicates that the proposed scheme can effectively mitigate the impact of malicious updates and maintain stable model performance, even under strong adversarial conditions.
As shown in
Table 4, the per-round communication cost of all schemes remained at a comparable scale under
clients. Our scheme incurred
bits per round, which was close to the other schemes. Although the RSA-based and BLS-based schemes exhibited slightly higher communication overhead, the differences remained marginal. This indicates that the proposed design does not introduce excessive communication burden compared to existing secure aggregation mechanisms.
Table 5 analyzes the retransmission overhead under unstable air-to-ground links. As the packet loss rate increased from
to
, the total communication cost of our scheme rose moderately from
to
bits. The overhead grew approximately linearly with the packet loss rate, validating the retransmission model. Importantly, even under packet loss conditions, the overall communication remained within the same order of magnitude, demonstrating stable behavior in unreliable wireless environments.
Table 6 provides a detailed per-round time breakdown. All schemes exhibited comparable performance in terms of communication time, enclave transition latency, and HE operation time. However, our scheme substantially reduced the time cost in the aggregation and verification phase compared with the other methods. Overall, the communication overhead of the proposed scheme was comparable to existing methods. The proposed scheme achieved the lowest total system latency. The advantage mainly comes from its efficient aggregation and verification mechanism. The results show that the proposed design maintained both communication efficiency and low computational overhead.
During the parameter upload phase, each user generated an integrity proof for its model parameter message and sent it to the server. Our scheme incurred computational overhead based on the message size, as it generated proofs for the entire message, unlike the RSA- and BLS-based schemes, which only sampled a few data blocks. With a fixed sampling ratio, the overhead of the RSA- and BLS-based schemes remained constant.
Figure 7 shows the average overhead when the message size increased from 4 MB to 128 MB, with 128 users and a sample size of 100. The results indicate that our scheme had significantly lower overhead than the RSA- and BLS-based schemes, taking only 5.7 ms on average, compared with 59.1 ms and 708.2 ms for the RSA and BLS schemes, respectively. This confirms the efficiency of our scheme at the user side.