MD-Net: A Lightweight Dual-Branch Network with Adaptive Time-Frequency Masking for Robust UAV RF Signal Classification
Abstract
1. Introduction
- We propose an Adaptive Time-Frequency Masking mechanism for RF physical fading. Unlike traditional data augmentation in computer vision, this approach is tightly tied to the physical properties of radio wave propagation. It adaptively masks certain frequency bands and time segments in the feature map. In doing so, it simulates “frequency-selective fading” and “burst signal truncation” seen in real-world electromagnetic environments. This targeted augmentation strategy improves the model’s robustness to non-stationary background noise.
- We construct an ultra-lightweight dual-branch fusion architecture based on time-frequency feature decoupling. Unlike conventional approaches that map RF signals to images and use high-complexity convolutional networks (such as 2D-CNNs), MD-Net introduces a physical-feature decoupling design. It uses an MLP branch to efficiently compress static envelope features of the global spectrum. At the same time, an LSTM branch captures dynamic time-hopping temporal dependencies in communication protocols. This heterogeneous fusion architecture preserves complete topological correlations in the time-frequency domain. It also improves UAV signal discrimination under complex backgrounds, with an extremely lightweight parameter count of only 0.32 M.
2. Relevant Research
3. Introduction to the MD-Net Model
3.1. Overall Structure of the MD-Net Model
3.2. Adaptive Time-Frequency Masking Module
- RF-IF-AutoMask enhancement: The core energy region of the signal is located by calculating the instantaneous frequency, and the signal amplitude in the non-core region is locally occluded to avoid damaging the key features of the signal. The occlusion behavior is precisely controlled by the time occlusion parameter and the frequency occlusion parameter : The time occlusion parameter represents the number of continuous units of the occlusion area in the time dimension of the time-frequency graph, and the frequency occlusion parameter represents the number of continuous units in the frequency dimension. The two jointly determine the shape and size of the time-frequency mask matrix , which is composed of several rectangular regions of size and is distributed around the instantaneous main frequency trajectory. Subsequently, complex Gaussian noise is used to replace the occluded area to generate the enhanced sample mask , and its calculation is shown in Equation (2).In the formula, is a binary mask matrix determined by and , and is complex Gaussian noise. Its amplitude is adaptively scaled according to , thereby achieving precise control of the ratio of noise to signal amplitude through this scale factor. The core reason for choosing complex Gaussian noise is to adapt to the complex characteristics of the time-frequency graph (including amplitude and phase information), avoid signal phase distortion caused by local replacement, and strictly control the occlusion ratio during the occlusion and replacement process (not exceeding 20% of the total signal area) to ensure that the main features of the signal are not damaged.
- AFH enhancement: AFH enhancement is to apply a smooth random phase perturbation to the signal frequency range to simulate the frequency drift phenomenon. Its calculation is shown in Equation (3).Here, represents the original RF complex baseband signal, is the phase modulation term that varies with time, is the phase perturbation sequence generated by a smooth random process, and is the enhanced signal. The frequency drift simulation is achieved indirectly through a phase-progressive offset.
- PNT enhancement: PNT augmentation involves superimposing smooth noise on the signal phase to simulate the real channel phase variation. Its calculation is shown in Equation (4).Among them, is the random phase modulation term, which only acts on the signal phase and does not change the amplitude characteristics. is the phase noise component, which follows a zero-mean Gaussian distribution . is the enhanced signal, precisely restoring the channel degradation mode of “only phase-disturbed”.
- PSD gated screening [19]: To ensure that the spectral characteristics of the enhanced samples are consistent with the original signal, a power spectral density similarity gating is introduced, retaining only the samples with spectral differences less than the threshold . Its calculation is shown in Equation (5).In the formula, is the power spectral density (PSD) of the original signal, is the power spectral density of the enhanced signal, and is the Euclidean norm.
3.3. MLP-LSTM Dual-Branch Network Structure
- 1.
- Input layer: The dual-branch network shares the same input source, namely the radio frequency feature vector, enhanced by data preprocessing and adaptive time-frequency occlusion. The original RF signal of the unmanned aerial vehicle (UAV) is first subjected to a discrete Fourier transform to obtain a high-dimensional frequency-domain amplitude feature containing approximately 5000 frequency components. To eliminate dimensional differences among frequency components, the Z-Score standardization is applied to this feature first. Subsequently, Principal Component Analysis (PCA) is introduced in the input layer of the MD-Net model for dimensionality reduction. PCA, as an unsupervised linear dimensionality reduction algorithm, retains 98% of the valid discriminant information by performing steps such as covariance matrix calculation, feature decomposition, and principal component screening. Before processing, the data has high dimensionality, a large amount of redundant information, and strong correlations, which can increase the model’s computational cost and lead to overfitting. After processing, a low-dimensional, one-dimensional feature vector is obtained, which not only effectively removes redundant information and some noise but also reduces computational cost and ensures that key information is not lost. Ultimately, a one-dimensional feature vector x of dimension D is formed, denoted as , where is the symbol of the real number field, used to represent the numerical type and vector space attribute of x. This feature vector serves as the common input to the subsequent MLP and LSTM branches, providing a unified data basis for dual-branch feature coding.
- 2.
- MLP static feature extraction branch: The MLP branch primarily extracts global, static discriminative features from radio frequency signals. This branch directly takes the input feature vector x without additional dimension flattening and performs nonlinear mapping and feature compression layer by layer through a multi-layer fully connected network. The specific structure consists of three fully connected layers, with network scales of 256, 128, and 64 in sequence. Each layer uses the ReLU activation function to enhance the network’s nonlinear expressiveness and mitigate the vanishing gradient problem. Its forward propagation process is shown in Equation (6), where is the ReLU activation function, and are trainable parameters, and is the output feature vector of the i-th fully connected layer. Through layer-by-layer compression and feature recombination, the MLP branch ultimately outputs a 64-dimensional static feature vector that characterizes the stable differences among different types of unmanned aerial vehicles at the overall spectral distribution level.
- 3.
- LSTM dynamic feature modeling branch: LSTM branches are used to capture the temporal dynamic dependencies of RF signals. Their goal is not to model physical time series, but to leverage LSTM’s structural advantages for sequence modeling to explore dependencies between feature dimensions. Firstly, the input to this branch is the one-dimensional feature vector (dimension D) obtained after standardization and PCA dimensionality reduction. Since the LSTM network requires input in sequence form, the core reconstruction must be performed using the Reshape operation. The reconstruction process is as follows: According to the rules, the one-dimensional vector is artificially divided into T continuous feature sub-segments, each of which contains F feature dimensions (satisfying , ensuring that the total number of features remains unchanged and there is no information loss. Eventually, it is reconstructed into a two-dimensional pseudo-sequence X with “time sequence step T × feature dimension F”, as shown in Equation (7). As a structured pseudo-sequence representation inspired by communication framing rather than a true physical time sequence, this reconstruction process serves to adapt to the LSTM’s input structure requirements while capturing the underlying protocol characteristics. Subsequently, a bidirectional LSTM (BiLSTM) with 64 hidden units is adopted to encode the reconstructed feature sequence X, simultaneously modeling the association patterns between feature subspaces in both forward and reverse directions, and outputting a 128-dimensional high-order feature representation. To further compress redundant information and improve the compatibility of feature fusion, a fully connected compression layer (Dense 32, ReLU) is introduced after the BiLSTM to map high-dimensional features into 32-dimensional compact feature vectors. This vector comprehensively characterizes the correlation structure among radio-frequency features, providing an effective supplement for subsequent fusion with MLP branch features.
- 4.
- Feature fusion and classification output layer: The static features from the MLP branch are concatenated with the dynamic features from the LSTM branch and fed to the Softmax classifier to obtain the final prediction. The calculation is shown in Equation (8), where represents the feature concatenation operation, and are the parameters of the classification layer, and is the final multi-classification prediction output.
4. Experimental Setup
4.1. Introduction to the Dataset
4.1.1. DroneRF Dataset
4.1.2. DroneRFa Dataset
4.2. Experimental Environment and Parameters
4.3. Experimental Evaluation Index
4.4. Model Implementation and Training
5. Experimental Results and Analysis
5.1. Model Index Evaluation
5.2. Model Efficiency Evaluation
5.3. Model Comparison Experiment
5.4. Model Ablation Experiment
5.5. Model Generalization Experiment
6. Conclusions
- The Adaptive Time-Frequency Masking module can effectively enhance the diversity of training samples without increasing the model parameters, thereby significantly improving the model’s anti-interference performance in complex interference environments. Experiments show that this module can increase average accuracy by 3.78 percentage points and significantly reduce cross-validation performance fluctuations.
- The MLP-LSTM dual network structure can simultaneously capture the static amplitude characteristics and time series dependence of radio frequency signals, thereby enhancing the feature expression ability and increasing the model accuracy by 3.69 percentage points.
- When the two improved modules were jointly integrated, the average accuracy rate of the model reached 85.58%, which was a significant improvement compared to the 80.31% of the baseline model. This indicates that the two exhibit strong synergistic gain and can effectively improve the stability of RF signal identification in real-world interference environments.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, H.; Luo, J.; Wang, H. A review of radio frequency fingerprint recognition methods for unmanned aerial vehicles. Radio Eng. 2024, 54, 2672–2684. [Google Scholar]
- Shi, Z.; Chang, X.; Yang, C. An acoustic-based surveillance system for amateur drones detection and localization. IEEE Trans. Veh. Technol. 2020, 69, 2731–2739. [Google Scholar] [CrossRef]
- Zhang, Z.; Shi, Z.; Gu, Y. Ziv–Zakai bound for DOAs estimation. IEEE Trans. Signal Process. 2023, 71, 136–149. [Google Scholar] [CrossRef]
- Khan, M.A.; Menouar, H.; Eldeeb, A. On the detection of unauthorized drones—Techniques and future perspectives: A review. IEEE Sens. J. 2022, 22, 11439–11455. [Google Scholar] [CrossRef]
- Wang, Q.; Zhou, H. Research on unmanned aerial vehicle detection technology. Eng. Constr. 2024, 7, 58. [Google Scholar]
- Alam, S.S.; Chakma, A.; Rahman, M.H. RF-enabled deep-learning-assisted drone detection and identification: An end-to-end approach. Sensors 2023, 23, 4202. [Google Scholar] [CrossRef] [PubMed]
- Aouladhadj, D.; Kpre, E.; Deniau, V. Drone detection and tracking using RF identification signals. Sensors 2023, 23, 7650. [Google Scholar] [CrossRef] [PubMed]
- Kilic, R.; Kumbasar, N.; Oral, E.A. Drone classification using RF signal based spectral features. Eng. Sci. Technol. Int. J. 2022, 28, 101028. [Google Scholar] [CrossRef]
- Yang, L.; Camtepe, S.; Gao, Y. Robustness and security enhancement of radio frequency fingerprint identification in time-varying channels. arXiv 2024, arXiv:2410.07591. [Google Scholar] [CrossRef]
- Tiras, F.E.; Altinoluk, H.S. CrossRF: A domain-invariant deep learning approach for RF fingerprinting. arXiv 2025, arXiv:2505.18200. [Google Scholar]
- Yu, N.; Mao, S.; Zhou, C. DroneRFA: A large-scale UAV radio frequency signal dataset for detecting low-altitude UAVs. J. Electron. Inf. Technol. 2024, 46, 1147–1156. [Google Scholar]
- Zhou, X.; Tang, D.; Cai, Y. Radio frequency fingerprint extraction method for identity recognition of wireless devices. Integr. Circuits Embed. Syst. 2024, 24, 69–72. [Google Scholar]
- Su, Z.; Yan, X.; Han, B. Real-time detection method of unmanned aerial vehicle radio frequency signal under low signal-to-noise ratio conditions. Signal Process. 2023, 39, 919–928. [Google Scholar]
- Al-Sa’d, M.F.; Al-Ali, A.; Mohamed, A. RF-based drone detection and identification using deep learning approaches: An initiative towards a large open source drone database. Future Gener. Comput. Syst. 2019, 100, 86–97. [Google Scholar] [CrossRef]
- Allahham, M.H.D.S.; Al-Sa’d, M.F.; Al-Ali, A. DroneRF dataset: A dataset of drones for RF-based detection, classification and identification. Data Brief 2019, 26, 104313. [Google Scholar] [CrossRef] [PubMed]
- Swinney, C.J.; Woods, J.C. Unmanned aerial vehicle flight mode classification using convolutional neural network and transfer learning. In Proceedings of the 16th International Computer Engineering Conference (ICENCO), Cairo, Egypt, 29–30 December 2020; pp. 83–87. [Google Scholar]
- Bai, H.; Li, S.; Jia, Y. Radio signal recognition using two-stage spatiotemporal network with bispectral analysis. Sensors 2025, 25, 5449. [Google Scholar] [CrossRef] [PubMed]
- Choudhary, P.; Sihag, V.; Choudhary, G. DRIFTER: A drone identification technique using RF signals. Forensic Sci. Int. Digit. Investig. 2025, 54, 301948. [Google Scholar] [CrossRef]
- Fang, M.; Pani, S.; Di Fulvio, A. Enabling PSD-capability for a high-density channel imager. In Proceedings of the IEEE Nuclear Science Symposium and Medical Imaging Conference, Piscataway, NJ, USA, 16–23 October 2021; pp. 1–4. [Google Scholar]
- Alqodah, M.A.; Tahsin, M.; Omari, M.H. RF-based lightweight machine learning for comprehensive drone activity classification. In Proceedings of the 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things, Mt. Pleasant, MI, USA, 7–8 September 2024; pp. 1–5. [Google Scholar]
- Al-Emadi, S.; Al-Senaid, F. Drone detection approach based on radio-frequency using convolutional neural network. In Proceedings of the IEEE International Conference on Informatics, IoT, and Enabling Technologies, Doha, Qatar, 2–5 February 2020; pp. 29–34. [Google Scholar]
- Akter, R.; Doan, V.S.; Lee, J.M. CNN-SSDI: Convolution neural network inspired surveillance system for UAVs detection and identification. Comput. Netw. 2021, 201, 108519. [Google Scholar] [CrossRef]











| Parameter | Numerical Value |
|---|---|
| Epochs | 30 |
| Batch Size | 10 |
| Dropout Rate | 0.2 |
| Optimizer | Adam |
| Loss Function | Categorical Cross-Entropy |
| Learning Rate | 0.001 |
| MLP Hidden Layers | [256, 128, 64] |
| LSTM Units | 64 |
| Time Mask Param | 10 |
| Frequency Mask Param | 6 |
| Augmentation Noise SNR | 40 dB |
| Type | F1-Score | Recall | Precision |
|---|---|---|---|
| Background | 1.00 | 1.00 | 1.00 |
| Bebop | 0.84 | 0.98 | 0.73 |
| AR | 0.87 | 0.78 | 0.98 |
| Phantom | 0.52 | 0.36 | 0.97 |
| Model | Params (M) | FLOPs (M) | Inference Time (ms) | Inference Peak Memory Usage (MB) |
|---|---|---|---|---|
| MLP | 0.30 | 0.59 | 23.20 | 2.14 |
| MD-Net | 0.32 | 4.35 | 27.32 | 3.33 |
| Model Name | Accuracy | F1-Score | Recall | Precision | Params | FLOPs | Inference Time |
|---|---|---|---|---|---|---|---|
| MLP * | 80.31% | 79.56% | 80.43% | 84.02% | 0.30 M | 0.59 M | 23.20 ms |
| Existing CNN | 61.80% | 59.00% | 60.00% | 86.00% | 2.03 M | 10.32 M | 26.30 ms |
| LSTM | 62.90% | 60.00% | 62.00% | 84.00% | – | – | – |
| RandomForest | 81.37% | 73.80% | 72.38% | 89.05% | – | – | – |
| Transformer | 84.47% | 79.01% | 76.55% | 90.03% | 0.07 M | 6.17 M | 25.78 ms |
| DNN | 84.52% | 78.81% | 76.43% | 91.08% | 0.16 M | 0.32 M | 24.70 ms |
| 1D-CNN | 85.45% | 84.68% | – | – | – | – | – |
| MD-Net ⋆ | 85.58% | 85.00% | 85.49% | 89.02% | 0.32 M | 4.35 M | 27.32 ms |
| Baseline | A | B | Accuracy | F1-Score | Recall | Precision |
|---|---|---|---|---|---|---|
| 🗸 | × | × | 80.31% | 79.56% | 80.43% | 84.02% |
| 🗸 | 🗸 | × | 84.09% | 83.00% | 83.86% | 88.00% |
| 🗸 | × | 🗸 | 84.00% | 82.82% | 84.05% | 87.73% |
| 🗸 | 🗸 | 🗸 | 85.58% | 85.00% | 85.49% | 89.02% |
| Dataset | Model | Accuracy | F1-Score | Recall | Precision |
|---|---|---|---|---|---|
| DroneRF | MLP | 80.31% | 79.56% | 80.43% | 84.02% |
| MD-Net | 85.58% | 85.00% | 85.49% | 89.02% | |
| DroneRFa | MLP | 80.92% | 80.88% | 80.92% | 81.32% |
| MD-Net | 83.15% | 83.06% | 83.15% | 83.20% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, M.; Dou, L.; Sun, Q. MD-Net: A Lightweight Dual-Branch Network with Adaptive Time-Frequency Masking for Robust UAV RF Signal Classification. Information 2026, 17, 562. https://doi.org/10.3390/info17060562
Huang M, Dou L, Sun Q. MD-Net: A Lightweight Dual-Branch Network with Adaptive Time-Frequency Masking for Robust UAV RF Signal Classification. Information. 2026; 17(6):562. https://doi.org/10.3390/info17060562
Chicago/Turabian StyleHuang, Min, Leihan Dou, and Qiuhong Sun. 2026. "MD-Net: A Lightweight Dual-Branch Network with Adaptive Time-Frequency Masking for Robust UAV RF Signal Classification" Information 17, no. 6: 562. https://doi.org/10.3390/info17060562
APA StyleHuang, M., Dou, L., & Sun, Q. (2026). MD-Net: A Lightweight Dual-Branch Network with Adaptive Time-Frequency Masking for Robust UAV RF Signal Classification. Information, 17(6), 562. https://doi.org/10.3390/info17060562

