A Lightweight CNN-Based Method for Micro-Doppler Feature-Based UAV Detection and Classification
Abstract
1. Introduction
2. RangDopplerNet Model Architecture
- 1.
- Decomposing channel attention into direction-aware feature encodingBy applying 1D pooling, the output feature map is aggregated separately along the horizontal (X-axis) and vertical (Y-axis) directions to generate two direction-aware feature vectors. For example: horizontal pooling (X Avg Pool) extracts vertical features of each channel into a tensor with shape C × 1 × W; vertical pooling (Y Avg Pool) extracts horizontal features of each channel into a tensor with shape C × H × 1.Assume the output feature map of the convolutional layer can be expressed as a C × H × W tensor, where C stands for the number of channels, H stands for the height and W stands for the width of the feature map. The horizontal and vertical pooling can be expressed as:where is the horizontal pooling result of the c-th channel; Variable h stands for the h-th row; 1/W stands for the average over W elements; stands for the value at h-th row, i-th column and c-th channel, respectively. Similarly, stands for the vertical pooling result of the c-th channel; Variable w stands for the w-th column; 1/H stands for the average over H elements; stands for the value at j-th row, w-th column and c-th channel, respectively.
- 2.
- Cross-Directional Feature Fusion and Encoding [16]After 1D pooling, we obtain a feature representation along the width direction, denoted as , with a tensor shape of . Similarly, pooling along the height direction yields , shaped as . These two directional feature vectors are then concatenated to form a unified feature tensor of shape . It is combined through a shared 1 × 1 convolutional layer for nonlinear transformation and channel compression. The above-mentioned operation can be expressed as:where stands for the concatenation operation along the spatial dimension. stands for the transpose of . stands for the convolution. stands for the non-linear activation function. f denotes the intermediate feature. And is the number of channels after the channel compression.
- 3.
- The intermediate feature f can be split along the horizontal and vertical dimensions into and , respectively. Two separate convolution, are applied to generate the corresponding attention weights along the horizontal and vertical directions. This operation can be expressed as:where and stands for convolution, they are used to transform and back to C channels (same channel number as input) and generate attention map along height (denote as ) and width (denote as ) direction.Finally, the computed attention weights are applied to the original feature map through element-wise multiplication, producing a refined feature map of dimensions C × H × W that emphasizes critical regions more effectively. This operation can be expressed as:where denotes the new feature map after applying coordinate attention, while represents the original feature map. It is evident that the dimension of the feature map remains unchanged after introducing coordinate attention, while the local details within the feature map are selectively enhanced.
3. Experimental Data
- Drone Target Characteristics: Drones generally possess a low radar cross-section (RCS), which leads to diminished echo signal strength. Due to their compact design and small physical footprint, the reflected radar energy is confined to a limited set of range cells. Additionally, their micro-Doppler signatures typically exhibit a stable and tightly clustered distribution, reflecting consistent motion patterns from rotating components such as propellers.
- Vehicle Target Characteristics: Vehicles generally possess a high radar cross-section (RCS) owing to their substantial physical size, leading to significant energy dispersion across multiple range cells. Due to their rigid construction and absence of moving parts with relative motion, the resulting Doppler energy distribution tends to be concentrated and stable, reflecting consistent velocity profiles.
- Pedestrian Target Characteristics: Pedestrians generally produce moderate echo signal strength. Their distinctive micro-Doppler signatures stem from gait-induced motion, characterized by relative movement between body parts—most notably, the arms swinging at a higher frequency than the torso. Leg movements exhibit cyclical acceleration and deceleration patterns. Consequently, the Doppler energy is broadly dispersed, capturing the intricate and dynamic nature of human walking behavior.
4. Model Training
4.1. Training Parameter Settings
4.2. Evaluation Metrics
5. Results and Analysis
5.1. Experimental Results
5.2. Limitations and Prospects
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, X.; Wang, S.; Zhang, X.; Wandelt, S. LAERACE: Taking the policy fast-track towards low-altitude economy. J. Air Transp. Res. Soc. 2025, 4, 100058. [Google Scholar] [CrossRef]
- Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A.; Guizani, M. Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges. IEEE Access 2019, 7, 48572–48634. [Google Scholar] [CrossRef]
- Khan, M.A.; Menouar, H.; Eldeeb, A.; Abu-Dayya, A.; Salim, F.D. On the Detection of Unauthorized Drones—Techniques and Future Perspectives: A Review. IEEE Sens. J. 2022, 22, 11439–11455. [Google Scholar] [CrossRef]
- Wang, J. Current Status and Development of Low-Slow-Small Target Surveillance Technology. Radar Sci. Technol. 2020, 18, 5. [Google Scholar]
- Tang, Z.; Ma, H.; Qu, Y.; Mao, X. UAV Detection with Passive Radar: Algorithms, Applications, and Challenges. Drones 2025, 9, 76. [Google Scholar] [CrossRef]
- Dumitrescu, C.; Minea, M.; Costea, I.M.; Cosmin Chiva, I.; Semenescu, A. Development of an Acoustic System for UAV Detection. Sensors 2020, 20, 4870. [Google Scholar] [CrossRef] [PubMed]
- Aydin, B.; Singha, S. Drone Detection Using YOLOv5. Eng 2023, 4, 416–433. [Google Scholar] [CrossRef]
- Allahham, M.S.; Khattab, T.; Mohamed, A. Deep Learning for RF-Based Drone Detection and Identification: A Multi-Channel 1-D Convolutional Neural Networks Approach. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qatar, 2–5 February 2020; pp. 112–117. [Google Scholar] [CrossRef]
- Han, S.K.; Lee, J.H.; Jung, Y.H. Convolutional Neural Network-Based Drone Detection and Classification Using Overlaid Frequency-Modulated Continuous-Wave (FMCW) Range–Doppler Images. Sensors 2024, 24, 5805. [Google Scholar] [CrossRef] [PubMed]
- Sadeghi Adl, Z.; Ahmad, F. Whitening-Aided Learning from Radar Micro-Doppler Signatures for Human Activity Recognition. Sensors 2023, 23, 7486. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Geng, Y.; Gao, Y.; Ding, Q.; Li, D.; Liu, N.; Chen, J. Doppler Radar-Based Human Speech Recognition Using Mobile Vision Transformer. Electronics 2023, 12, 2874. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, M.; Jing, H.; Liu, Z. RadarTCN: Lightweight Online Classification Network for Automotive Radar Targets Based on TCN. Sensors 2024, 24, 2813. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Gao, Z.; Zhai, Y.; Wang, Q.; Gao, Z.; Xu, Z.; Zhou, Y. A CNNA-Based Lightweight Multi-Scale Tomato Pest and Disease Classification Method. Sustainability 2023, 15, 8813. [Google Scholar] [CrossRef]
- Roldan, I.; del Blanco, C.R.; Duque de Quevedo, Á.; Ibañez Urzaiz, F.; Gismero Menoyo, J.; Asensio López, A.; Berjón, D.; Jaureguizar, F.; García, N. DopplerNet: A convolutional neural network for recognising targets in real scenarios using a persistent range–Doppler radar. IET Radar Sonar Navig. 2020, 14, 593–600. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
- Wang, W.; Yao, L.; Chen, L.; Lin, B.; Cai, D.; He, X.; Liu, W. CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention. arXiv 2021, arXiv:2108.00154. [Google Scholar] [CrossRef] [PubMed]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2016, arXiv:1409.0473. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Duque de Quevedo, Á.; Ibañez Urzaiz, F.; Gismero Menoyo, J.; Asensio López, A. Drone detection and radar-cross-section measurements by RAD-DAR. IET Radar Sonar Navig. 2019, 13, 1437–1447. [Google Scholar] [CrossRef]
- Minkler, G.; Minkler, J. Cfar: The Principles of Automatic Radar Detection in Clutter; Magellan Book Company: Bamberg, Germany, 1990. [Google Scholar]
- Cao, Z.; Li, J.; Song, C.; Xu, Z.; Wang, X. Compressed Sensing-Based Multitarget CFAR Detection Algorithm for FMCW Radar. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9160–9172. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Swaminathan, S.; Tantri, B.R. Confusion Matrix-Based Performance Evaluation Metrics. Afr. J. Biomed. Res. 2024, 27, 4023–4031. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]










| Number of Layers | Stride | Padding | Types | Filter Shape | Output Size |
|---|---|---|---|---|---|
| 1 | 2 | 1 | Convolutional Layer | 3 × 3 | 6 × 31 × 32 |
| 2 | 2 | 1 | Convolutional Layer | 3 × 3 | 3 × 16 × 32 |
| 3 | 2 | 1 | Convolutional Layer | 3 × 3 | 2 × 8 × 64 |
| 4 | - | - | Fully Connected Layer | - | 64 |
| 5 | - | - | Fully Connected Layer | - | 64 |
| 6 | - | - | Fully Connected Layer | - | 3 |
| Name of the Parameter | Value |
|---|---|
| Radar frequency | 8.75 GHz (X Band) |
| Bandwidth | 500 MHz |
| Range Resolution () | 0.878 m |
| Number of range bin () | 4096 |
| Maximum unambiguous range (R) | 3.596 km |
| Doppler Resolution | 0.34 km/h |
| Number of integrated ramps | 512 |
| Location of CA Module | Training Accuracy (%) | Valid Accuracy (%) | FLOPs |
|---|---|---|---|
| Loc1 | 97.04 | 96.60 | 1.15 M |
| Loc2 | 98.06 | 97.28 | 0.93 M |
| Loc3 | 99.24 | 98.08 | 1.24 M |
| Network Architecture | Training Accuracy (%) | Valid Accuracy (%) | Number of Parameters | FLOPs |
|---|---|---|---|---|
| Convolutional Kernel Size (3 × 3) | 99.24 | 98.08 | 101,419 | 1.24 M |
| Convolutional Kernel Size (7 × 7) | 99.76 | 98.23 | 223,867 | 4.43 M |
| Channel Expansion | 99.64 | 98.03 | 375,195 | 2.48 M |
| Network Type | Sample Category | Recall (%) | F1 Score (%) | Accuracy (%) |
|---|---|---|---|---|
| RangDopplerNet Type-1 | Vehicle | 93.79 | 95.25 | 96.75 |
| RangDopplerNet Type-2 | Vehicle | 96.24 | 97.53 | 98.48 |
| RangDopplerNet Type-1 | Unmanned Aerial Vehicle | 96.55 | 94.92 | 93.33 |
| RangDopplerNet Type-2 | Unmanned Aerial Vehicle | 98.07 | 96.89 | 95.74 |
| RangDopplerNet Type-1 | Pedestrian | 99.27 | 99.23 | 99.20 |
| RangDopplerNet Type-2 | Pedestrian | 99.63 | 99.56 | 99.49 |
| Model Type | Average Accuracy (%) | Number of Parameters | Relative Parameter Ratio (%) | FLOPs |
|---|---|---|---|---|
| NasNetMobile | 97.69 | 4,272,887 | 100 | - |
| DopplerNet | 99.48 | 3,818,755 | 89.37 | - |
| MobileNetV2 | 98.94 | 3,325,043 | 77.82 | - |
| MobileNetV1 | 96.83 | 3,209,475 | 75.11 | 580.7 M |
| SqueezeNext | 96.78 | 589,171 | 13.8 | 225.7 M |
| RangDopplerNet Type-1 | 96.71 | 98,019 | 2.29 | 0.87 M |
| RangDopplerNet Type-2 | 98.08 | 101,419 | 2.37 | 1.24 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Tu, G.; Xu, Y.; Zhou, X. A Lightweight CNN-Based Method for Micro-Doppler Feature-Based UAV Detection and Classification. Electronics 2025, 14, 4831. https://doi.org/10.3390/electronics14244831
Zhang L, Tu G, Xu Y, Zhou X. A Lightweight CNN-Based Method for Micro-Doppler Feature-Based UAV Detection and Classification. Electronics. 2025; 14(24):4831. https://doi.org/10.3390/electronics14244831
Chicago/Turabian StyleZhang, Luyan, Gangyi Tu, Yike Xu, and Xujia Zhou. 2025. "A Lightweight CNN-Based Method for Micro-Doppler Feature-Based UAV Detection and Classification" Electronics 14, no. 24: 4831. https://doi.org/10.3390/electronics14244831
APA StyleZhang, L., Tu, G., Xu, Y., & Zhou, X. (2025). A Lightweight CNN-Based Method for Micro-Doppler Feature-Based UAV Detection and Classification. Electronics, 14(24), 4831. https://doi.org/10.3390/electronics14244831

