A Multi-Modal Approach for Robust Oriented Ship Detection: Dataset and Methodology
Highlights
- We constructed MOS-Ship, a novel, high-resolution (sub-meter), spatially aligned optical-SAR dataset designed to assess multimodal detection accuracy. It uniquely includes a synthetic degradation benchmark (featuring multi-level cloud and fog) to specifically evaluate model robustness under realistic adverse weather conditions.
- We propose MOS-DETR, a novel detection framework built upon a query-based architecture featuring an innovative multimodal encoding backbone. This design effectively integrates optical textures and SAR scattering signatures at the feature level, and is further coupled with an adaptive probabilistic fusion mechanism to ensure high accuracy under adverse weather.
- Our MOS-Ship dataset and benchmark bridge the gap between idealized research settings and real-world operational challenges, providing a critical resource for developing and validating truly all-weather multimodal algorithms.
- The proposed multi-modal and adaptive fusion approach offers a practical and resilient solution for robust maritime surveillance, ensuring reliable ship detection even when optical satellite imagery is obscured or degraded by poor weather.
Abstract
1. Introduction
- 1.
- We construct MOS-Ship, a novel, high-quality multimodal optical–SAR ship detection dataset. Its primary advantages over existing datasets include precise spatial alignment between modalities, sub-meter resolution imagery, and extensive coverage of complex maritime environments such as major ports and straits. MOS-Ship captures temporally non-synchronous, dynamic targets and is augmented with a synthetic benchmark featuring multi-level cloud and fog simulations to support robust multimodal evaluation.
- 2.
- We propose MOS-DETR (Multi-modal Oriented-Ship DEtection TRansformer), a novel detection framework that integrates multimodal feature encoding into an oriented query-based detector to achieve precise and robust ship detection under spatial–temporal asynchrony.
- 3.
- We develop a probabilistic decision integration mechanism that adaptively fuses detection results from optical and SAR modalities according to image quality, ensuring reliable performance under both clear and degraded conditions.
- 4.
- Extensive experiments on our proposed dataset demonstrate the framework’s effectiveness. Our method achieves high accuracy on multimodal (84.1 AP50), RGB-only (88.8 AP50), and SAR-only (77.0 AP50) test splits. Crucially, under simulated adverse weather, our probabilistic fusion mechanism improves detection accuracy by 17.7 percentage points over the optical-only baseline (74.4% vs. 56.7%), confirming its robustness.
2. Materials and Methods
2.1. Multi-Modal Optical-SAR Ship Dataset
2.1.1. Data Acquisition and Collection Strategy
2.1.2. Spatio-Temporal Co-Registration
2.1.3. Dataset Finalization and Augmentation
2.2. Multi-Modal Oriented-Ship Detection Transformer
2.2.1. Overall Framework
2.2.2. Backbone
Dual-Branch Patch Embedding (DPE)
Modality-Conditioned LoRA (MC-LoRA)
- (i)
- Window-based Multi-Head Self-Attention (W-MSA)/Shifted Window-based Multi-head Self-Attention (SW-MSA):
- (ii)
- Feed-Forward Network (FFN):
Parameter-Efficiency Analysis
2.2.3. Detection Head
Multi-Scale Feature Unification and Query Construction
Decoder Layer Update
2.2.4. Adaptive Max-Confidence Fusion
| Algorithm 1: Adaptive Max-Confidence Fusion |
![]() |
Geometry-Aware Iterative Matching
Max-Confidence and Argmax Selection
3. Results
3.1. Implementation Details
3.2. Ablation Study Results
3.3. Comparison with Other Object Detectors
3.4. Robustness Analysis in Adverse Weather
4. Discussion
4.1. Architectural Component Analysis
4.2. Performance vs. Other Methods
4.3. Interpretation of Robustness in Adverse Weather
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, X.; Liu, B.; Zheng, G.; Ren, Y.; Zhang, S.; Liu, Y.; Gao, L.; Liu, Y.; Zhang, B.; Wang, F. Deep-learning-based information mining from ocean remote-sensing imagery. Natl. Sci. Rev. 2020, 7, 1584–1605. [Google Scholar] [CrossRef] [PubMed]
- Demir, B.; Bovolo, F.; Bruzzone, L. Detection of land-cover transitions in multitemporal remote sensing images with active-learning-based compound classification. IEEE Trans. Geosci. Remote Sens. 2012, 50, 1930–1941. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Li, Z.; Chai, S.; Zhang, H.; Yu, G. YOLOv10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
- Wang, R.; You, Y.; Zhang, Y. Ship detection in foggy remote sensing image via scene classification R-CNN. In Proceedings of the 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, China, 22–24 August 2018; IEEE: New York, NY, USA, 2018; pp. 81–85. [Google Scholar]
- Zhang, Z.; Zheng, H.; Cao, J.; Liu, W. FRS-Net: An efficient ship detection network for thin-cloud and fog-covered high-resolution optical satellite imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2326–2340. [Google Scholar] [CrossRef]
- Zhao, Z.; Li, S. OASL: Orientation-aware adaptive sampling learning for arbitrary oriented object detection. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103740. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Liu, C. Balance learning for ship detection from synthetic aperture radar remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 182, 190–207. [Google Scholar] [CrossRef]
- Zhang, X.; Yang, X.; Li, Y.; Yang, J.; Cheng, M.-M.; Li, X. Rsar: Restricted state angle resolver and rotated sar benchmark. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 7416–7426. [Google Scholar]
- Yu, H.; Liu, B.; Wang, L.; Li, T. LD-Det: Lightweight Ship Target Detection Method in SAR Images via Dual Domain Feature Fusion. Remote Sens. 2025, 17, 1562. [Google Scholar] [CrossRef]
- Sun, Z.; Leng, X.; Zhang, X.; Zhou, Z.; Xiong, B.; Ji, K.; Kuang, G. Arbitrary-Direction SAR Ship Detection Method for Multiscale Imbalance. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5208921. [Google Scholar]
- Wang, H.; Liu, S.; Lv, Y.; Li, S. Scattering Information Fusion Network for Oriented Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4013105. [Google Scholar]
- Dong, J.; Feng, J.; Tang, X. OptiSAR-Net: A Cross-Domain Ship Detection Method for Multi-Source Remote Sensing Data. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4709311. [Google Scholar]
- He, J.; Su, N.; Xu, C.; Li, H. A cross-modality feature transfer method for target detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5213615. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, L.; Wu, J. Optical and synthetic aperture radar image fusion for ship detection and recognition: Current state, challenges, and future prospects. IEEE Geosci. Remote Sens. Mag. 2024, 12, 132–168. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Li, J. SAR ship detection dataset (SSDD): Official release and comprehensive data analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
- Xian, S.; Wu, Z.; Sun, Y.; Zhang, Q. AIR-SARShip-1.0: High-resolution SAR ship detection dataset. J. Radars 2019, 8, 852–863. [Google Scholar]
- Ruan, R.; Yang, K.; Zhao, Z. OGSOD-2.0: A challenging multimodal benchmark for optical-SAR object detection. In Proceedings of the Sixteenth International Conference on Graphics and Image Processing (ICGIP 2024), Nanjing, China, 8–10 November 2024; SPIE: Bellingham, WA, USA, 2025; pp. 11–21. [Google Scholar]
- Lindenberger, P.; Sarlin, P.-E.; Pollefeys, M. LightGlue: Local Feature Matching at Light Speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 17581–17592. [Google Scholar]
- Zhao, J.; Ding, Z.; Zhou, Y.; Xu, Y. OrientedFormer: An end-to-end transformer-based oriented object detector in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5640816. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-scale Dataset for Object Detection in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Wang, H.; Li, S.Y.; Yang, J.; Liu, Y.; Lv, Y.; Zhou, Z. Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. Int. Conf. Learn. Represent. 2022, 1, 3. [Google Scholar]
- Zhang, S.; Wang, X.; Wang, J.; Pang, J.; Lyu, C.; Zhang, W.; Luo, P.; Chen, K. Dense Distinct Query for End-to-End Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7329–7338. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Lu, G. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2849–2858. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef]
- Yang, S.; Pei, Z.; Zhou, F.; Yu, L. Rotated Faster R-CNN for oriented object detection in aerial images. In Proceedings of the 2020 3rd International Conference on Robot Systems and Applications, Tokyo, Japan, 26–29 December 2020; ACM: New York, NY, USA, 2020; pp. 35–39. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z. R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Shi, X. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Han, J.; Ding, J.; Li, J.; Li, H. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar]
- Han, J.; Ding, J.; Xue, N.; Luo, H. ReDet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2786–2795. [Google Scholar]
- Yang, X.; Zhang, G.; Li, W.; Liu, T. H2RBox: Horizontal box annotation is all you need for oriented object detection. arXiv 2022, arXiv:2210.06742. [Google Scholar]
- Lee, S.; Lee, S.; Song, B.C. CFA: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access 2022, 10, 78446–78454. [Google Scholar]
- Pu, Y.; Wang, Y.; Xia, Z.; Zhu, X. Adaptive rotated convolution for rotated object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6589–6600. [Google Scholar]
- Li, Z.; Hou, B.; Wu, Z.; Chen, X. FCOSR: A simple anchor-free rotated detector for aerial object detection. Remote Sens. 2023, 15, 5499. [Google Scholar] [CrossRef]
- Lee, W.; Chang, H.; Moon, J.; Lee, J.; Kim, M. ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 8848–8858. [Google Scholar]
- Chen, Y.T.; Shi, J.; Ye, Z.; Mertz, C.; Ramanan, D.; Kong, S. Multimodal Object Detection via Probabilistic Ensembling. In Computer Vision—ECCV 2022; Springer: Cham, Switzerland, 2022; pp. 145–163. [Google Scholar]






| Part I: General Data Partition | |||
|---|---|---|---|
| Subset | Scenes (Image Pairs) | Description | Total Instances |
| Training Set | 996 + 996 | SAR + RGB (Clear) | 4389 |
| Validation Set | 250 + 250 | SAR + RGB (Clear) | 1083 |
| Validation Set | 250 | RGB (Cloud-Augmented) | 549 |
| Total | 1246 + 1246 + 250 | SAR + RGB (Clear) + RGB (Cloud-Augmented) | 6021 |
| Part II: Scene Distribution | |||
| Scene Category | Count (Image Pairs) | Percentage | Note |
| In-shore (Ports) | 712 | 57.14% | Heavy clutter, land interference |
| Off-shore (Open Sea) | 534 | 42.86% | Sea clutter, pure background |
| Part III: Ship Category/Scale Distribution | |||
| Ship Category * | Instance Count | Scale (Pixels) | Ratio |
| Small Ship | 1133 | Area | 20.71% |
| Medium Ship | 3598 | Area | 65.75% |
| Large Ship | 741 | Area | 13.54% |
| DPE | MC-LoRA | MOS-Ship | |||||
|---|---|---|---|---|---|---|---|
| MIX | RGB_ONLY | SAR_ONLY | |||||
| Recall | AP50 | Recall | AP50 | Recall | AP50 | ||
| × | × | 98.5 | 73.3 | 99.4 | 83.1 | 97.6 | 58.4 |
| ✓ | × | 97.0 | 74.7 | 97.0 | 84.9 | 97.0 | 61.5 |
| × | ✓ | 98.7 | 79.5 | 99.1 | 88.0 | 98.2 | 66.5 |
| ✓ | ✓ | 99.1 | 84.1 | 99.7 | 88.8 | 99.1 | 77.0 |
| Method | Year | MIX | RGB_ONLY | SAR_ONLY | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| AP25 | AP50 | AP75 | AP25 | AP50 | AP75 | AP25 | AP50 | AP75 | ||
| RoI Transformer [28] | 2019 | 79.5 | 71.5 | 58.8 | 83.4 | 83.3 | 75.1 | 74.1 | 63.8 | 39.2 |
| Gliding Vertex [29] | 2020 | 41.1 | 31.3 | 11.8 | 46.2 | 37.7 | 8.9 | 36.1 | 25.7 | 1.9 |
| Rotated Faster R-CNN [30] | 2020 | 77.5 | 61.3 | 12.2 | 83.6 | 69.5 | 17.5 | 68.9 | 46.0 | 9.4 |
| R3Det [31] | 2021 | 66.0 | 50.2 | 13.3 | 71.2 | 60.6 | 16.6 | 60.5 | 39.2 | 6.5 |
| Oriented R-CNN [32] | 2021 | 80.0 | 77.5 | 52.7 | 82.8 | 82.8 | 71.2 | 76.1 | 67.0 | 33.6 |
| S2A-Net [33] | 2021 | 84.6 | 79.5 | 38.8 | 86.9 | 86.8 | 57.7 | 81.7 | 69.3 | 16.9 |
| ReDet [34] | 2021 | 67.9 | 59.5 | 34.9 | 73.1 | 68.2 | 50.8 | 63.6 | 48.5 | 21.0 |
| ORENet [34] | 2021 | 67.9 | 59.5 | 34.9 | 73.1 | 68.2 | 50.8 | 63.6 | 48.5 | 21.0 |
| H2RBox [35] | 2022 | 78.1 | 59.4 | 20.2 | 81.5 | 69.5 | 26.1 | 73.7 | 53.2 | 15.3 |
| CFA [36] | 2022 | 61.3 | 45.1 | 19.6 | 68.0 | 53.8 | 27.4 | 52.8 | 34.6 | 11.7 |
| Rotated Retinanet [37] | 2023 | 73.4 | 57.3 | 28.3 | 79.5 | 69.6 | 38.4 | 67.4 | 54.1 | 19.0 |
| Rotated FCOS [38] | 2023 | 73.7 | 63.7 | 22.0 | 78.0 | 68.9 | 34.9 | 69.3 | 53.5 | 13.3 |
| OptiSAR-Net [15] | 2024 | 69.7 | 62.6 | 35.0 | 78.9 | 74.4 | 51.1 | 60.7 | 50.9 | 18.9 |
| OrientedFormer [22] | 2024 | 81.7 | 80.3 | 56.6 | 85.5 | 85.4 | 72.8 | 77.3 | 72.7 | 39.4 |
| ABBSPO [39] | 2025 | 85.8 | 74.7 | 37.5 | 88.3 | 86.4 | 47.7 | 82.6 | 67.5 | 23.0 |
| UCR [11] | 2025 | 86.4 | 71.9 | 22.3 | 88.6 | 84.1 | 34.1 | 83.4 | 56.2 | 13.6 |
| MOS-DETR(Ours) | 2025 | 84.6 | 84.1 | 62.8 | 88.8 | 88.8 | 82.8 | 78.5 | 77.0 | 49.6 |
| Method | Accuracy | Precision | Recall | |
|---|---|---|---|---|
| baseline | 56.7 | 53.6 | 34.5 | |
| Score-Fusion | Box-Fusion | Accuracy | Precision | Recall |
| probEn [40] | avg | 60.1 | 73.4 | 63.9 |
| probEn [40] | s-avg | 60.1 | 73.4 | 63.9 |
| probEn [40] | argmax | 60.1 | 73.4 | 63.9 |
| avg | avg | 65.6 | 77.2 | 69.7 |
| avg | s-avg | 65.6 | 77.1 | 69.6 |
| avg | argmax | 65.6 | 77.1 | 69.6 |
| max | avg | 77.6 | 84.1 | 86.7 |
| max | s-avg | 77.6 | 84.2 | 86.8 |
| max | argmax | 77.7 | 84.2 | 86.8 |
| Variable Parameter | (°) | Accuracy | Precision | Recall | |
|---|---|---|---|---|---|
| Varying IoU Threshold | 0.1 | 45 | 77.7 | 84.2 | 86.8 |
| 0.2 | 45 | 76.3 | 82.6 | 86.9 | |
| 0.3 | 45 | 74.6 | 80.5 | 86.9 | |
| 0.5 | 45 | 67.3 | 72.3 | 86.9 | |
| Varying Angle Threshold | 0.1 | 15 | 77.1 | 83.4 | 86.8 |
| 0.1 | 30 | 77.6 | 84.2 | 86.8 | |
| 0.1 | 45 | 77.7 | 84.2 | 86.8 | |
| 0.1 | 60 | 77.6 | 84.2 | 86.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
You, J.; Lv, Y.; Li, S.; Liu, S.; Zhang, K.; Liu, Y. A Multi-Modal Approach for Robust Oriented Ship Detection: Dataset and Methodology. Remote Sens. 2026, 18, 274. https://doi.org/10.3390/rs18020274
You J, Lv Y, Li S, Liu S, Zhang K, Liu Y. A Multi-Modal Approach for Robust Oriented Ship Detection: Dataset and Methodology. Remote Sensing. 2026; 18(2):274. https://doi.org/10.3390/rs18020274
Chicago/Turabian StyleYou, Jianing, Yixuan Lv, Shengyang Li, Silei Liu, Kailun Zhang, and Yuxuan Liu. 2026. "A Multi-Modal Approach for Robust Oriented Ship Detection: Dataset and Methodology" Remote Sensing 18, no. 2: 274. https://doi.org/10.3390/rs18020274
APA StyleYou, J., Lv, Y., Li, S., Liu, S., Zhang, K., & Liu, Y. (2026). A Multi-Modal Approach for Robust Oriented Ship Detection: Dataset and Methodology. Remote Sensing, 18(2), 274. https://doi.org/10.3390/rs18020274


