6.1. Training Behavior and Convergence
All four models—Basic 1D CNN, Multi-Scale CNN, Multi-Channel CNN, and LUMEN—were trained under identical optimization settings to ensure a fair comparison. Overall, all models exhibited stable learning behavior without significant fluctuations, indicating that the adopted training configuration was appropriate for the UAV RF fingerprinting task.
Among the evaluated models, LUMEN showed the most stable convergence trend, and its training process is presented in
Figure 5. For brevity, the training and validation curves of the three baseline models are provided in
Appendix C.
As shown in
Figure 5, both the training and validation loss curves decreased steadily, while the corresponding accuracy curves increased and gradually converged. These results indicate that the proposed model effectively learned discriminative spectral features from the PSD-based inputs.
In addition, the gap between training and validation performance remained relatively small throughout training, suggesting limited overfitting. The validation accuracy reached 97.42%, and early stopping was triggered at epoch 99. Overall, the results demonstrate that the proposed model can be trained stably under the adopted optimization settings.
6.2. Overall Classification Performance
Table 9 summarizes the classification performance of the four evaluated models. Starting from the Basic 1D CNN baseline, additional architectural components were progressively introduced to analyze the effects of multi-scale feature extraction and multi-channel input processing.
Multi-Scale CNN improved over the baseline model, indicating that multiple receptive fields helped capture diverse spectral characteristics from the PSD representation. Multi-Channel CNN achieved further improvement, suggesting that adjacent stacked PSD segments provided useful short-term spectral continuity information.
Among all evaluated architectures, LUMEN, which combines both multi-scale and multi-channel design strategies, achieved the best overall performance, achieving 0.975 for accuracy, precision, recall, and F1-score. Overall, the results demonstrate a consistent performance improvement as the architecture evolved from the baseline model to the proposed model.
6.3. Confusion Matrix and Class-Wise Analysis
Figure 6 presents the confusion matrix of the proposed LUMEN, while the corresponding results for the baseline architectures are provided in
Appendix D. Overall, most UAV classes were classified correctly, with only limited off-diagonal errors observed across the confusion matrix.
The most noticeable confusion occurred among UAV04, UAV05, and UAV06. One possible explanation is that UAV01–UAV06 are all DJI platforms and therefore may share similar communication protocols, hardware characteristics, and transmission behaviors. In particular, UAV04, UAV05, and UAV06 were monitored within the same 2.43 GHz frequency range, which may produce similar spectral occupancy and sideband patterns in the PSD representation.
In addition, the dataset includes multiple flight states, including moving conditions with pitch, roll, and yaw maneuvers. These operating conditions can introduce intra-class spectral variation within the same UAV model. Such dataset characteristics may partially explain the localized confusion among spectrally similar DJI UAVs.
Nevertheless, the overall confusion pattern indicates that LUMEN effectively learned discriminative spectral features for most UAV classes and maintained strong class-wise classification performance.
6.4. Latent Feature Space Analysis
To evaluate the quality of the learned feature representations, the latent feature spaces of the four evaluated models were analyzed using clustering metrics.
Table 10 summarizes the quantitative results, including the Silhouette Score, Davies–Bouldin Index (DBI), and Calinski–Harabasz Index (CHI).
Among the evaluated models, LUMEN achieved the highest Silhouette Score (0.181) and CHI (305.558), along with the lowest DBI (2.242). These results indicate improved cluster compactness and inter-class separability compared with the baseline architectures.
Figure 7 presents the t-SNE visualization of the feature embeddings generated by LUMEN. Overall, the 14 UAV classes formed relatively well-separated clusters with limited overlap. Although localized mixing was observed among several DJI UAV classes, the overall feature distribution demonstrates meaningful inter-class separation.
The partial overlap among DJI UAV clusters is consistent with the confusion matrix analysis discussed in
Section 6.3. Because several DJI UAVs share similar communication protocols and were collected within similar operating frequency ranges, their PSD characteristics may become partially overlapped in the latent feature space. Nevertheless, most UAV classes remained distinguishable, indicating that the proposed model learned robust and discriminative spectral representations.
For comparison, the t-SNE visualizations of the baseline models are provided in
Appendix E.
6.5. Edge Deployment Evaluation
6.5.1. NPU-Specific Model Optimization and Deployment
To enable real-time inference on the target edge platform, the proposed LUMEN model was deployed on the Rockchip RK3582 NPU using an ONNX-to-RKNN conversion workflow.
The trained model was first exported to the ONNX format [
34]. During this process, several implementation details were adjusted to ensure compatibility with the RKNN execution environment. In particular, the deployed model used a fixed input configuration with a stacked PSD tensor of size 1 × 8192 × 3.
The exported ONNX model was subsequently converted using RKNN-Toolkit2 with RK3582 specified as the target platform. After conversion, inference was executed directly on the NPU so that most computational operations could be offloaded from the CPU.
Based on this deployment configuration, runtime behavior was analyzed in terms of inference latency, throughput, hardware resource usage, and long-term operational stability.
6.5.2. Inference Latency and Throughput Analysis
To evaluate whether the deployed model can support real-time operation on the target edge device, a continuous 30 min inference benchmark was conducted on the RK3582 NPU. The benchmark used PSD-based input tensors with a shape of 1 × 8192 × 3, and a total of 993,346 inference iterations were processed during the experiment.
Figure 8 shows the latency trend throughout the benchmark. Across nearly one million iterations, the inference latency remained consistently around the 1.7 ms range, indicating stable long-duration execution. Although slightly higher latency values were observed during the initial stage, the overall variation remained small and no significant warm-up behavior was observed.
The average inference latency was 1.73 ms, which was identical to the median latency. The 95th and 99th percentile latencies were 1.76 ms and 1.94 ms, respectively, indicating that most inferences were completed within a narrow latency range.
Only a limited number of outliers appeared during the benchmark. Specifically, 23 iterations exceeded 3 ms, and only 2 iterations exceeded 5 ms. Although the maximum observed latency reached 50.77 ms, this occurred only once during the entire experiment and did not affect the overall execution stability.
Table 11 summarizes the inference latency and throughput results obtained during the benchmark.
Based on the measured average latency, the deployed model achieved a throughput of 578.95 FPS. This processing rate is substantially higher than the update frequency typically required for UAV RF identification, providing sufficient computational margin for continuous operation and additional processing tasks on the same platform.
Overall, the latency benchmark demonstrates that the NPU-deployed LUMEN model can sustain stable real-time inference on the Rockchip RK3582 processor (Rockchip Electronics Co., Ltd., Fuzhou, China).
6.5.3. Hardware Resource Efficiency and Thermal Robustness in Edge Environments
In addition to latency performance, hardware-level behavior was further analyzed during the same 30 min benchmark.
Figure 9 presents the temporal trends of CPU usage and process memory usage, while
Table 12 summarizes the corresponding hardware resource and thermal statistics.
As shown in
Figure 9, CPU utilization remained low and stable throughout the benchmark. The average CPU usage was 7.05%, indicating that most inference operations were handled by the NPU, while the CPU was mainly responsible for auxiliary runtime tasks.
The process RSS remained within a bounded range of approximately 110–190 MB and did not exhibit a continuously increasing trend over time, indicating stable memory behavior during long-duration inference.
Figure 10 presents the temperature profiles of the SoC, little CPU core, and NPU during execution. After the initial increase from the idle state, all measured temperatures reached stable operating ranges and remained nearly constant throughout the remainder of the benchmark.
As summarized in
Table 12, the average SoC and NPU temperatures were 40.65 °C and 40.32 °C, respectively. No noticeable thermal instability or overheating behavior was observed during continuous execution.
Overall, the results demonstrate that the deployed LUMEN model maintained low CPU overhead, bounded memory usage, and stable thermal behavior during continuous real-time inference on the RK3582 platform.
6.6. Discussion and Limitations
The experimental results indicate that PSD-based spectral representations can effectively distinguish UAV RF characteristics across multiple UAV categories and flight states. The qualitative PSD analysis and classification results suggest that spectral occupancy patterns, sideband-related structures, and state-dependent spectral variations contribute to inter-class separability within the proposed dataset.
In particular, the inclusion of commercial DJI platforms, hobbyist drones, and custom-built UAVs allowed the proposed approach to learn diverse RF spectral characteristics associated with different communication behaviors and hardware configurations. Furthermore, the moving flight condition, which included pitch, roll, and yaw maneuvers, introduced additional temporal and spectral variability that partially reflects dynamic UAV operating conditions. These dataset attributes contributed to the robustness of the learned spectral representations under varying operating states.
The experimental results also demonstrate that lightweight PSD-based representations combined with multi-scale and multi-channel processing can provide an effective balance between classification performance and real-time edge deployment capability. The proposed model achieved stable runtime behavior on the RK3582 NPU while maintaining relatively low model complexity, demonstrating the practical feasibility of lightweight RF fingerprinting systems for edge computing environments.
Nevertheless, several limitations should be acknowledged.
First, the dataset was collected in an electromagnetic anechoic chamber. Although this environment improves reproducibility and enables clearer observation of UAV-specific spectral characteristics, it does not fully represent real-world outdoor RF environments. In practical deployment scenarios, RF signals may be affected by multipath fading, non-line-of-sight propagation, external interference, and dynamic channel variations, which may influence the robustness and generalization capability of PSD-based UAV identification systems.
Second, several important RF impairments were not systematically modeled or evaluated. Although signal strength variation was partially considered through multiple indoor gain settings and moving flight conditions, the present study did not explicitly incorporate additive noise, co-channel interference, Doppler shifts caused by UAV motion, time-varying fading effects, orientation-dependent channel variation, or long-term hardware characteristic drift. While the proposed dataset partially reflects mobility-related signal variation, the current approach does not explicitly analyze localization-induced variability or complex spatial channel dynamics encountered in real-world deployment conditions.
Finally, the current study provides limited sensitivity analysis for several architectural and preprocessing parameters, including FFT size, segment duration, channel stacking configuration, and kernel size selection in the multi-scale branches. These parameters were empirically selected based on preliminary experiments and computational constraints for lightweight edge deployment. In particular, the three-channel stacking strategy was designed to preserve short-term spectral continuity while maintaining low computational overhead, whereas the multi-scale kernel sizes were selected to capture spectral patterns at different receptive fields. Although the proposed configuration achieved stable performance in the evaluated environment, broader ablation and sensitivity analyses would further improve the interpretability of the architectural design choices and provide deeper insight into the trade-offs between classification accuracy and runtime efficiency.
Future work will focus on extending the dataset to outdoor and interference-rich environments. Additional studies will investigate more realistic RF channel impairments, dynamic spatial conditions, and broader parameter sensitivity analyses to further improve the robustness and practical applicability of lightweight UAV RF fingerprinting systems.