Domain-Unified Adaptive Detection Framework for Small Vehicle Targets in Monostatic/Bistatic SAR Images
Highlights
- A domain-unified adaptive target detection framework (DUA-TDF) for small vehicle targets in monostatic/bistatic SAR images is proposed.
- The multi-scale detail-aware CycleGAN (MSDA-CycleGAN) and the cross-window axial self-attention target detection network (CWASA-Net) are proposed to enable effective and robust detection of small vehicle targets under both within-domain and cross-domain testing conditions.
- The proposed MSDA-CycleGAN achieves unpaired image style transfer while emphasizing both global structure and local details of the generated images, significantly enhancing the generalization capability of downstream target detection models.
- The proposed CWASA-Net improves the detection performance for small targets in complex backgrounds by conducting collaborative optimization from feature extraction and feature fusion.
Abstract
1. Introduction
- (1)
- To address the significant performance degradation caused by a mismatch between the distributions of the training and test data, we propose a style generation network named MSDA-CycleGAN to align the source and target domains at the image level, thereby achieving unpaired image style transfer. This network incorporates two key designs to achieve precise and detail-preserving alignment: first, a multilayer guided filtering (MGF) module is embedded at the end of the generator to adaptively enhance high-frequency details in the generated images; second, a structural similarity constraint is introduced into the cycle-consistency loss, forcing the model to preserve critical geometric structures and spatial contextual information during style transfer. These mechanisms collectively ensure that the aligned images are more conducive to the generalization of downstream detection models.
- (2)
- To tackle the challenge of detecting small targets under complex background clutter, we propose a cross-window axial self-attention target detection network (CWASA-Net). This network conducts collaborative optimization from two key dimensions: feature extraction and feature fusion. On one hand, the proposed cross-window axial self-attention mechanism is employed in the deeper layers of the backbone network. It efficiently refines high-level features by capturing both local details and global dependencies, thereby compensating for the information scarcity of small targets. On the other hand, a convolution-based stacked cross-scale feature fusion network is designed as the neck network. The convolution-based stacked fusion modules, combined with bidirectional fusion paths, integrate and enhance the multi-scale features extracted by the backbone, thereby strengthening the interaction between shallow localization information and deep semantic information. These two components work synergistically to produce fused features that are rich in both contextual information and fine local details, significantly enhancing the discriminative representation capability for multi-scale features.
- (3)
- To validate the effectiveness of the proposed algorithm, we conduct comprehensive experiments on the self-developed monostatic/bistatic SAR datasets under multiple within-domain and cross-domain testing conditions. The superiority and generalization capability of our method are verified through comparative experiments with other state-of-the-art (SOTA) target detection algorithms, ablation experiments, and generalization experiments on public dataset.
2. Related Work
2.1. SAR Target Detection
2.2. Small Target Detection
2.3. Domain Adaptation for Target Detection
3. Dataset Description
3.1. Monostatic and Bistatic MiniSAR Systems
3.2. Monostatic/Bistatic SAR Datasets
4. Methodology
4.1. Framework Overview
4.2. Image-to-Image Translation
4.2.1. Multilayer Guided Filtering Module
4.2.2. Optimization of Loss Function
4.3. Feature Extraction and Target Detection
4.3.1. Cross-Window Axial Self-Attention Based Intra-Scale Feature Interaction Module
Standard Window Axial Self-Attention Mechanism
Shifted Window Axial Self-Attention Mechanism
4.3.2. Convolution-Based Stacked Cross-Scale Feature Fusion Network
Differentiated Convolutional Kernel Configuration Strategy
Convolution-Based Stacked Fusion Module
5. Experiments
5.1. Introduction of Datasets
5.2. Implement Details
5.3. Evaluation Metrics
5.4. Comprehensive Experiments Under Various Testing Conditions
5.5. Comparative Experiments
5.6. Ablation Experiments
5.7. Generalization Experiments
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, Y.H.; Zhang, F.B.; Chen, L.Y.; Wan, Y.L.; Jiang, T. A novel error correction method for airborne HRWS SAR based on azimuth-variant attitude and range-variant doppler domain pattern. Remote Sens. 2025, 17, 2831. [Google Scholar] [CrossRef]
- Lv, J.M.; Zhu, D.Y.; Geng, Z.; Han, S.L.; Wang, Y.; Yang, W.X.; Ye, Z.; Zhou, T. Recognition of deformation military targets in the complex scenes via MiniSAR submeter images with FASAR-Net. IEEE Trans. Geosci. Remote Sens. 2023, 61. [Google Scholar] [CrossRef]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster R-CNN for object detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Finn, H.M.; Johnson, R.S. Adaptive detection mode with threshold control as a function of spatially sampled clutter-level estimates. IEEE Trans. Aerosp. Electron. Syst. 1968, 4, 822–829. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.M.; Liao, H.-Y. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.M.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Pan, G.L.; Wu, Q.H.; Zhou, B.; Li, J.; Wang, W.; Ding, G.R. Spectrum prediction with deep 3D pyramid vision transformer learning. IEEE Trans. Wirel. Commun. 2025, 24, 509–525. [Google Scholar] [CrossRef]
- Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 214–230. [Google Scholar]
- Zhao, M.; Zhang, X.; Kaup, A. Multitask learning for SAR ship detection with Gaussian-mask joint segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5214516. [Google Scholar] [CrossRef]
- Wan, H. AFSar: An anchor-free SAR target detection algorithm based on multiscale enhancement representation learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5219514. [Google Scholar] [CrossRef]
- Chen, P.; Wang, Y.; Liu, H. GCN-YOLO: YOLO based on graph convolutional network for SAR vehicle target detection. IEEE Geosci. Remote Sens. Lett. 2024, 21. [Google Scholar] [CrossRef]
- Du, Y.; Du, L.; Guo, Y.; Shi, Y. Adaptive anchor-based detector with constrained RIRConv for oriented vehicles in SAR images. IEEE Trans. Geosci. Remote Sens. 2025, 63. [Google Scholar] [CrossRef]
- Lv, J.M.; Zhu, D.; Geng, Z.; Chen, H.; Huang, J.; Niu, S.; Ye, Z.; Zhou, T.; Zhou, P. Efficient target detection of monostatic/bistatic SAR vehicle small targets in ultracomplex scenes via lightweight model. IEEE Trans. Geosci. Remote Sens. 2024, 62. [Google Scholar] [CrossRef]
- Zhou, P.; Wang, P.; Cao, J.; Zhu, D.Y.; Yin, Q.Y.; Lv, J.M.; Chen, P.; Jie, Y.S.; Jiang, C. PSFNet: Efficient detection of SAR image based on petty-specialized feature aggregation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 190–205. [Google Scholar] [CrossRef]
- Zhou, P.; Wang, P.; Zhu, D.Y.; Zhou, B.L.; Lv, J.M.; Ye, Z. SFANet: Efficient detection of vehicle targets in SAR images based on SAR-Specialized feature aggregation. IEEE Trans. Instrum. Meas. 2025, 74. [Google Scholar] [CrossRef]
- Ye, Z.; Zhou, P.; Zhu, D.Y.; Lv, J.M. Aggregated-Refined Feature Pyramid Network for Small Vehicle Target Detection in Monostatic Bistatic SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 23397–23415. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Inoue, N.; Furuta, R.; Yamasaki, T.; Aizawa, L. Cross-domain weakly-supervised object detection through progressive domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5001–5009. [Google Scholar]
- Sun, B.C.; Saenko, K. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In Proceedings of the European Conference on Computer Vision Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; pp. 443–450. [Google Scholar]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6949–6958. [Google Scholar]
- Cao, S.; Joshi, D.; Gui, L.Y.; Wang, Y.X. Contrastive mean teacher for domain adaptive object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 23839–23848. [Google Scholar]
- Li, Y.-J. Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7571–7580. [Google Scholar]
- Liu, Z.; Luo, S.; Wang, Y. Mix MSTAR: A synthetic benchmark dataset for multi-class rotation vehicle detection in large-scale SAR images. Remote Sens. 2023, 15, 4558. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [PubMed]
- Zhu, X.K.; Lyu, S.C.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Zhou, Z.Y. Comprehensive discussion on remote sensing modeling and dynamic electromagnetic scattering for aircraft with speed brake deflection. Remote Sens. 2025, 17, 1706. [Google Scholar] [CrossRef]
- Golubović, D.; Erić, M.; Vukmirović, N.; Vukmirović, N. Improved detection of targets on the high-resolution range-doppler map in HFSWRs. In Proceedings of the International Symposium INFOTEH-JAHORINA, East Sarajevo, Bosnia and Herzegovina, 20–22 March 2024; pp. 1–6. [Google Scholar]
- Golubović, D.; Erić, M.; Vukmirović, N. High-resolution azimuth detection of targets in HFSWRs under strong Interference. In Proceedings of the International Conference on Electrical, Electronic and Computing Engineering, Nis, Serbia, 3–6 June 2024; pp. 1–6. [Google Scholar]
- Golubović, D.; Marjanović, D. An experimentally-based method for detection threshold determination in HFSWR’s high-resolution range-doppler map target detection. In Proceedings of the 2025 24th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovinam, 19–21 March 2025; pp. 1–6. [Google Scholar]













| System Parameters | Value | Unit | ||||
|---|---|---|---|---|---|---|
| Mode | Spotlight | - | ||||
| Center frequency | 9.7 | GHz | ||||
| Bandwidth | 1800 | MHz | ||||
| Pulsewidth | 2 | MS | ||||
| PRF | 500 | Hz | ||||
| Depression angle | 15 | 26 | 31 | 37 | 45 | ° |
| Flight altitude | 94 | 150 | 210 | 265 | 350 | M |
| Operation distance | 360 | 335 | 407 | 439 | 495 | M |
| Flight speed | 5–7 | M/S | ||||
| Resolution | 0.1 × 0.1 | M × M | ||||
| System Parameters | Value | Unit |
|---|---|---|
| Mode | Spotlight | - |
| Center frequency | 9.7 | GHz |
| Bandwidth | 1800 | MHz |
| Pulsewidth | 1.9012 | MS |
| PRF | 525 | Hz |
| Flight speed | 5–7 | M/S |
| Resolution | 0.16 × 0.26 | M × M |
| Flight | Transmitter Depression Angle | Transmitter Depression Angle | Azimuth Bistatic Angle |
|---|---|---|---|
| 1 | 15° | 45° | 0° |
| 2 | 15° | 60° | 0° |
| 3 | 30° | 45° | 0° |
| 4 | 40° | 45° | 0° |
| 5 | 45° | 45° | 5° |
| 6 | 45° | 45° | 15° |
| 7 | 45° | 45° | 30° |
| 8 | 45° | 45° | 45° |
| 9 | 30° | 30° | 30° |
| Index | Full Name | Abbreviation |
|---|---|---|
| 1 | Soviet T34-85 Medium Tank | T34-85 |
| 2 | Domestic 59-1 130 mm Tower Cannon | 59-1TC |
| 3 | Domestic 59 Anti-aircraft Guns | 59AG |
| 4 | Shenyang J-6 | J6 |
| 5 | Homemade 54 122 mm Towed Howitzer | 54TH |
| 6 | Domestic 63A Amphibious Tank | 63AT |
| 7 | 63C Amphibious Armored Vehicle | 63CAAV |
| 8 | Domestic Type 62 Light Tank | 62LT |
| 9 | 63 Armored Personnel Transport Vehicle | 63APTV |
| 10 | Ground-based detection Radar | Radar |
| 11 | Antiaircraft Machine Gun | AMG |
| Monostatic | Bistatic | |||
|---|---|---|---|---|
| Month | Depression | Number | Flight | Number |
| March | 15° | 151 | 1 | 289 |
| 31° | 194 | 2 | 299 | |
| 45° | 169 | 3 | 294 | |
| Total | 514 | 4 | 287 | |
| 5 | 292 | |||
| July | 26° | 231 | 6 | 288 |
| 31° | 286 | 7 | 299 | |
| 37° | 297 | 8 | 283 | |
| 45° | 295 | 9 | 292 | |
| Total | 1109 | Total | 2623 | |
| EXP ID | Train | Test | |||
|---|---|---|---|---|---|
| Dataset | Number | Dataset | Number | ||
| EXP 1 | EXP 1-1 | Monostatic March all | 514 | Monostatic July all | 1109 |
| EXP 1-2 | Monostatic July all | 1109 | Monostatic March all | 514 | |
| EXP 2 | EXP 2-1 | Bistatic flight all | 2623 | Monostatic March all | 514 |
| EXP 2-2 | Bistatic flight all | 2623 | Monostatic July all | 1109 | |
| EXP 3 | EXP 3-1 | Monostatic March 15°, 31° | 345 | Monostatic March 45° | 169 |
| EXP 3-2 | Monostatic July 26°, 31°, 37° | 814 | Monostatic July 45° | 295 | |
| EXP 4 | EXP 4-1 | Bistatic flight 1, 3, 5, 7 | 1174 | Bistatic flight 9 | 292 |
| EXP 4-2 | Bistatic flight 2, 4, 6, 8 | 1157 | Bistatic flight 9 | 292 | |
| EXP ID | Precision (%) | Recall (%) | F1-Score (%) | mAP50 (%) | |
|---|---|---|---|---|---|
| EXP 1 | EXP 1-1 | 82.1 | 88.3 | 84.7 | 86.5 |
| EXP 1-2 | 89.5 | 90.2 | 89.6 | 90.4 | |
| EXP 2 | EXP 2-1 | 79.9 | 81.4 | 79.4 | 80.9 |
| EXP 2-2 | 81.1 | 81.3 | 80.9 | 83.0 | |
| EXP 3 | EXP 3-1 | 88.1 | 89.6 | 88.4 | 90.7 |
| EXP 3-2 | 88.4 | 88.7 | 88.2 | 90.9 | |
| EXP 4 | EXP 4-1 | 88.3 | 81.8 | 84.9 | 90.2 |
| EXP 4-2 | 87.6 | 82.7 | 85.1 | 90.1 | |
| Method | Backbone | mAP50 (%) | F1-Score (%) | Parameters | GFLOPs | FPS |
|---|---|---|---|---|---|---|
| Faster R-CNN [7] | Resnet50 | 73.4 | 64.8 | 136.730 M | 401.739 | 40.1 |
| SSD [11] | VGG16 | 63.1 | 52.1 | 23.879 M | 274.053 | 74.6 |
| RetinaNet [12] | Resnet50 | 70.8 | 69.2 | 36.371 M | 164.200 | 47.1 |
| CenterNet [34] | Resnet50 | 74.7 | 73.5 | 32.665 M | 109.714 | 78.2 |
| YOLOv5 | CSPDarkNet-53 | 73.9 | 71.7 | 46.642 M | 114.593 | 55.5 |
| YOLOv7 [10] | CSPDarkNet-53 | 77.9 | 73.2 | 37.205 M | 105.148 | 54.7 |
| YOLOv8 | CSPDarkNet-53 | 75.6 | 74.8 | 43.632 M | 165.412 | 58.5 |
| LTY-Network [21] | DarkNet-53 | 77.3 | 74.8 | - | - | 37.2 |
| ARF-YOLO [24] | CSPDarkNet-53 | 78.9 | 71.8 | 38.459 M | 123.530 | 38.3 |
| Our method | CSPDarkNet-53 | 86.5 | 84.7 | 88.389 M | 269.916 | 67.8 |
| YOLOv8 | MSDA-CycleGAN | CWASA-IFIM | CSFM | mAP50 (%) | F1-Score (%) | Parameters | GFLOPs |
|---|---|---|---|---|---|---|---|
| ✓ | 75.6 | 74.8 | 43.632 M | 165.412 | |||
| ✓ | ✓ | 83.4 | 83.5 | 71.915 M | 221.776 | ||
| ✓ | ✓ | 76.9 | 75.6 | 46.530 M | 169.427 | ||
| ✓ | ✓ | 76.8 | 75.4 | 57.213 M | 209.561 | ||
| ✓ | ✓ | ✓ | 77.7 | 76.7 | 60.106 M | 213.551 | |
| ✓ | ✓ | ✓ | 85.6 | 84.4 | 74.813 M | 225.792 | |
| ✓ | ✓ | ✓ | 85.4 | 84.1 | 85.496 M | 265.926 | |
| ✓ | ✓ | ✓ | ✓ | 86.5 | 84.7 | 88.389 M | 269.916 |
| Method | mAP50 (%) | Parameters | GFLOPs |
|---|---|---|---|
| SSD [11] | 64.4 | 26.151 M | 118.945 |
| RetinaNet [12] | 48.8 | 36.724 M | 72.111 |
| CenterNet [34] | 44.0 | 32.665 M | 46.354 |
| YOLOv5 | 74.1 | 46.734 M | 48.539 |
| YOLOv7 [10] | 50.9 | 37.297 M | 44.548 |
| YOLOv8 | 81.6 | 43.645 M | 69.917 |
| TPH-YOLOv5 [35] | 57.5 | 45.578 M | 130.637 |
| PSFNet [22] | 80.8 | 40.296 M | 57.831 |
| Our method | 82.5 | 60.115 M | 90.946 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, Z.; Zhou, P. Domain-Unified Adaptive Detection Framework for Small Vehicle Targets in Monostatic/Bistatic SAR Images. Remote Sens. 2025, 17, 3671. https://doi.org/10.3390/rs17223671
Ye Z, Zhou P. Domain-Unified Adaptive Detection Framework for Small Vehicle Targets in Monostatic/Bistatic SAR Images. Remote Sensing. 2025; 17(22):3671. https://doi.org/10.3390/rs17223671
Chicago/Turabian StyleYe, Zheng, and Peng Zhou. 2025. "Domain-Unified Adaptive Detection Framework for Small Vehicle Targets in Monostatic/Bistatic SAR Images" Remote Sensing 17, no. 22: 3671. https://doi.org/10.3390/rs17223671
APA StyleYe, Z., & Zhou, P. (2025). Domain-Unified Adaptive Detection Framework for Small Vehicle Targets in Monostatic/Bistatic SAR Images. Remote Sensing, 17(22), 3671. https://doi.org/10.3390/rs17223671
