Maritime Distress Target Detection Based on Improved RT-DETR: For Robust Small Target Localization
Highlights
- An improved RT-DETR-based maritime distress target detection method is proposed, integrating SFConv, SPE module, and Focaler-DIoU loss to significantly enhance small target detection and multi-scale feature representation.
- The proposed method achieves an mAP@50 of 0.8347, improving detection performance by 4.51% over the baseline while maintaining end-to-end real-time detection capability.
- The method provides a more accurate and robust solution for detecting small and complex maritime distress targets in dynamic ocean environments.
- It offers practical technical support for real-time UAV-based maritime monitoring and intelligent emergency rescue systems.
Abstract
1. Introduction
- 1.
- Small-object Focused Convolution (SFConv): To improve detection performance for small maritime distress targets, we designed a convolutional module specifically optimized for small-scale object detection. Combining three-layer convolution extraction, skip-connection augmentation, and structural reparameterization for inference acceleration, it effectively enhances small-object detection accuracy and model deployment performance.
- 2.
- Shallow Path Enhancement for Cross-scale Interaction (SPE): Addressing challenges such as the difficulty in detecting small maritime distress targets, significant scale variations, and strong background interference, this paper further proposes the SPE module. It focuses on optimizing the role of shallow-level features (P2) in cross-scale feature fusion. This module strengthens the upward propagation pathway of shallow semantic information, enhances the fine-grained expressive capability within the feature pyramid, and significantly boosts the model’s perception of multi-scale targets in complex backgrounds.
- 3.
- Focaler-DIoU Loss Reconstruction: The Focaler-DIoU loss function is introduced, reconstructing the DIoU loss through a linear interval mapping mechanism. This enhances the model’s attention and discrimination capabilities on challenging detection samples, thereby improving the overall robustness of the detection system.
2. Materials and Methods
2.1. Overview of the RT-DETR Network Model
2.2. Overall Structure of the Improved Model Based on RT-DETR
2.3. Improved Convolution Module SFConv
2.3.1. Design Motivation and Overall Structure
2.3.2. Implementation Details of SFConv
2.4. Cross-Scale Feature Interaction Optimization
2.4.1. Introduction of Shallow Features
2.4.2. Path-Enhanced Network
2.5. Focaler-DIoU
3. Results
3.1. Experimental Dataset
3.2. Experimental Environment and Parameter Settings
3.3. Model Performance Evaluation Metrics
3.4. Experimental Evaluation of Improved Modules
3.4.1. Comparative Experiment of SFConv Convolution Modules
- 1.
- Convolution Module Comparison ExperimentTo further enhance the feature extraction capability of the detection model, we systematically compared various improved convolution modules—including SFConv—by replacing the BasicBlock in the backbone network.Specifically, we evaluated the following mainstream convolution architectures: PConv (Perturbed Convolution) [20], DBB (Diverse Branch Block) [21], DEConv (Detail-enhanced Convolution) [22], DRB (Dilated Reparam Block) [23], DualConv [24], DySnake Conv [25], RFCBAMConv [26], WTConv [27], and our proposed SFConv. All modules were uniformly replaced based on the rtdetr-r18 main branch and evaluated under identical training configurations. Experimental results are shown in Table 3. Figure 6 presents a visual comparison of the data in Table 3.Based on the experimental results, SFConv demonstrated particularly outstanding performance on metric , achieving a score of 0.79203—the highest among all models and significantly outperforming the baseline model. The accuracy () metric reached 0.94208, indicating robust stability in target discrimination. It also achieved 0.44332 on the metric, significantly surpassing most comparison modules (e.g., RFCBAMConv’s 0.39629 and DRB’s 0.38613).These results demonstrate that while modules like PConv and WTConv also enhance model performance to some extent, SFConv more effectively captures and transmits multi-scale feature information while maintaining structural compactness, yielding superior detection accuracy and generalization capabilities.
- 2.
- Comparative Experiments on Structural Reparameterization MechanismTo validate the effectiveness of SFConv’s structural reparameterization mechanism, two sets of comparative experiments were designed (Table 4). Experiments were conducted under two scenarios: a baseline configuration (SFConv only) and an enhanced configuration (SFConv + SPE + Focaler-DloU). These scenarios compared the computational efficiency and complexity of models with and without structural reparameterization. Visual comparison results are shown in Figure 7.
3.4.2. Parameter Sensitivity Analysis of Loss Function
3.4.3. Loss Function Comparison Experiment
3.5. Ablation Studies
3.6. Full Dataset Validation Experiment
3.7. Visualization Effect Comparison
- 1.
- Enhanced small object detection capability: The improved model can effectively detect smaller scale objects in the image;
- 2.
- 3.
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Su, N.; Chen, X.; Guan, J.; Huang, Y. Maritime target detection based on radar graph data and graph convolutional network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4019705. [Google Scholar] [CrossRef]
- Yang, P.; Dong, L.; Xu, W. Small maritime target detection using gradient vector field characterization of infrared image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1827–1841. [Google Scholar] [CrossRef]
- Zhang, C.S.; Liu, Y.; Song, J.; Sun, S.; Qian, S. Maritime target detection of PCL system based on non-cooperative pulse radar. In Proceedings of the 2024 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Zhuhai, China, 22–24 November 2024; pp. 1–6. [Google Scholar]
- Liu, W.; Zhu, C.; Liu, Y.; Li, Z. Maritime target detection method based on feature fusion of visible and infrared images. In Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 21–23 March 2025; pp. 2487–2490. [Google Scholar]
- Xu, Z.Y. Aerial small target detection method based on improved Faster R-CNN. Shipbuild. Stand. Qual. 2024, 4, 30–37. [Google Scholar]
- Hou, Y.; Wu, Y.; Kou, X.R.; Huang, J.C.; Tuo, J.D.; Wang, Y.Q.; Huang, X.J. Small object detection algorithm for UAV images based on improved YOLOv8. Comput. Eng. Appl. 2025, 61, 83–92. [Google Scholar]
- Fu, Z.; Xiao, Y.; Tao, F.; Si, P.; Zhu, L. DLSW-YOLOv8n: A Novel Small Maritime Search and Rescue Object Detection Framework for UAV Images with Deformable Large Kernel Net. Drones 2024, 8, 310. [Google Scholar] [CrossRef]
- Xu, M.; Ma, L.; Jiang, Y. Spatio-temporal and global context information fusion based vehicle re-identification algorithm. J. Highw. Transp. Res. Dev. 2025, 42, 21–28. [Google Scholar]
- Zhang, Y.; Chen, G.; Zhang, P.; Tong, J.Y.; Shan, M.J.; Shan, H.L. Underwater target detection based on enhanced local features. China Meas. Test 2025, 51, 151–158. [Google Scholar]
- Liu, K.; Qi, Y.; Xu, G.; Li, J. YOLOv5s maritime distress target detection method based on swin transformer. IET Image Process. 2024, 18, 1258–1267. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs beat YOLOs on real-time object detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
- Neubeck, A.; van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Los Alamitos, CA, USA, 20–24 August 2006; pp. 850–855. [Google Scholar]
- Zhang, H.; Zhang, S. Focaler-IoU: More focused intersection over union loss. arXiv 2024, arXiv:2401.10525. [Google Scholar]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wan, C.; Yu, H.; Li, Z.; Chen, Y.; Zou, Y.; Liu, Y.; Yin, X.; Zuo, K. Swift Parameter-free Attention Network for Efficient Super-Resolution. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 16–22 June 2024; pp. 6246–6256. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 13728–13737. [Google Scholar]
- Cai, Z.; Ding, X.; Shen, Q.; Cao, X. Refconv: Reparameterized refocusing convolution for powerful convnets. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 11617–11631. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Varga, L.A.; Kiefer, B.; Messmer, M.; Zell, A. SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3686–3696. [Google Scholar]
- Park, S.; Yeo, Y.J.; Shin, Y.G. PConv: Simple yet effective convolutional layer for generative adversarial network. Neural Comput. Appl. 2021, 34, 7113–7124. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse Branch Block: Building a Convolution as an Inception-like Unit. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 19–25 June 2021; pp. 10881–10890. [Google Scholar]
- Chen, Z.X.; He, Z.W.; Lu, Z.M. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans. Image Process. 2024, 33, 1002–1015. [Google Scholar] [CrossRef] [PubMed]
- Ding, X.; Zhang, Y.; Ge, Y.; Zhao, S.; Song, L.; Yue, X.; Shan, Y. Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 5513–5524. [Google Scholar]
- Zhong, J.C.; Chen, J.Y.; Mian, A. DualConv: Dual convolutional kernels for lightweight deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9528–9535. [Google Scholar] [CrossRef] [PubMed]
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 6047–6056. [Google Scholar]
- Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating spatial attention and standard convolutional operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
- Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Computer Vision—ECCV 2024; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15112. [Google Scholar]
- Ma, S.; Xu, Y. MPDIoU: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More effective intersection over union loss with auxiliary bounding box. arXiv 2023, arXiv:2311.02877. [Google Scholar]

















| Category | Training Set | Validation Set |
|---|---|---|
| Number of Sub-datasets | 893 | 155 |
| Number of Annotations (Buoy) | 434 | 59 |
| Number of Annotations (Boat) | 1380 | 234 |
| Number of Annotations (Swimmer) | 3666 | 624 |
| Number of Annotations (Jet Ski) | 228 | 34 |
| Number of Annotations (Life-saving Equipment) | 88 | 36 |
| Parameter | Value |
|---|---|
| Input Image Size | |
| Initial Learning Rate | 0.0001 |
| Final Learning Rate | 1.0 |
| Momentum | 0.9 |
| Batch Size | 16 |
| Number of Epochs | 300 |
| Warm-up Epochs | 16 |
| Optimizer | AdamW |
| Method | P | R | mAP@50 | mAP@50:95 |
|---|---|---|---|---|
| rtdetr-r18 | 0.9221 | 0.7619 | 0.7896 | 0.4317 |
| PConv | 0.9373 | 0.7468 | 0.7804 | 0.4194 |
| DBB | 0.9101 | 0.7408 | 0.7665 | 0.4226 |
| DEConv | 0.9317 | 0.7549 | 0.7759 | 0.4186 |
| DRB | 0.9125 | 0.7220 | 0.7335 | 0.3861 |
| DualConv | 0.9179 | 0.7596 | 0.7669 | 0.4130 |
| DySnakeConv | 0.9252 | 0.7244 | 0.7496 | 0.4239 |
| RFCBAMConv | 0.9136 | 0.7131 | 0.7440 | 0.3969 |
| WTConv | 0.9386 | 0.7332 | 0.7615 | 0.4127 |
| SFConv | 0.9420 | 0.7493 | 0.7920 | 0.4433 |
| Model | Re-Parameterization | Layers | Parameters | GFLOPs | FPS |
|---|---|---|---|---|---|
| SFConv | No | 379 | 2,346,056 | 67.1 | 43.8 |
| SFConv | Yes | 299 | 19,879,464 | 57.0 | 46.8 |
| SFConv + SPE + FocalerDIoU | No | 418 | 22,368,200 | 88.3 | 38.7 |
| SFConv + SPE + FocalerDIoU | Yes | 338 | 18,601,608 | 78.2 | 40.6 |
| Method | P | R | mAP@50 | mAP@50:95 |
|---|---|---|---|---|
| rtdetr-r18 | 0.9221 | 0.7619 | 0.7896 | 0.4317 |
| + MPDIoU [28] | 0.9355 | 0.7569 | 0.7832 | 0.4271 |
| + InnerDIoU [29] | 0.9257 | 0.7429 | 0.7892 | 0.4180 |
| + Focaler-GIoU | 0.9321 | 0.7205 | 0.7695 | 0.4207 |
| + Focaler-EIoU | 0.9361 | 0.7230 | 0.7430 | 0.4117 |
| + Inner-MPDIoU | 0.9200 | 0.7432 | 0.7848 | 0.4323 |
| + Focaler-MPDIoU | 0.9378 | 0.7366 | 0.7658 | 0.4112 |
| + Proposed | 0.9231 | 0.7602 | 0.7974 | 0.4244 |
| Method | P | R | mAP@50 | mAP@50:95 |
|---|---|---|---|---|
| RT-DETR-r18 | 0.9221 | 0.7619 | 0.7896 | 0.4317 |
| + SFConv | 0.9420 | 0.7479 | 0.7920 | 0.4433 |
| + SPE | 0.9381 | 0.7735 | 0.7969 | 0.4554 |
| + Focaler-DIoU | 0.9239 | 0.7690 | 0.7947 | 0.4244 |
| + SFConv + SPE | 0.9007 | 0.7695 | 0.8054 | 0.4445 |
| + SFConv + SPE + Focaler-DIoU | 0.9177 | 0.7955 | 0.8347 | 0.4719 |
| Method | P | R | mAP@50 | mAP@50:95 |
|---|---|---|---|---|
| RT-DETR-r18 | 0.8409 | 0.8337 | 0.8559 | 0.5809 |
| EfficientViT | 0.9021 | 0.6814 | 0.7166 | 0.4032 |
| YOLOv10s | 0.8730 | 0.5908 | 0.6197 | 0.4032 |
| YOLO11s | 0.9018 | 0.5757 | 0.5975 | 0.3394 |
| YOLOv12s | 0.8969 | 0.5450 | 0.5750 | 0.3227 |
| YOLOv13s | 0.8980 | 0.5252 | 0.5123 | 0.5690 |
| RT-DETR-SSF | 0.8500 | 0.8392 | 0.8599 | 0.5857 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, K.; Chang, X.; Liu, Z.; Xu, J.; Zhang, Y.; Liu, Y. Maritime Distress Target Detection Based on Improved RT-DETR: For Robust Small Target Localization. Remote Sens. 2026, 18, 1908. https://doi.org/10.3390/rs18121908
Liu K, Chang X, Liu Z, Xu J, Zhang Y, Liu Y. Maritime Distress Target Detection Based on Improved RT-DETR: For Robust Small Target Localization. Remote Sensing. 2026; 18(12):1908. https://doi.org/10.3390/rs18121908
Chicago/Turabian StyleLiu, Kun, Xinbo Chang, Zhen Liu, Jian Xu, Yuhan Zhang, and Yang Liu. 2026. "Maritime Distress Target Detection Based on Improved RT-DETR: For Robust Small Target Localization" Remote Sensing 18, no. 12: 1908. https://doi.org/10.3390/rs18121908
APA StyleLiu, K., Chang, X., Liu, Z., Xu, J., Zhang, Y., & Liu, Y. (2026). Maritime Distress Target Detection Based on Improved RT-DETR: For Robust Small Target Localization. Remote Sensing, 18(12), 1908. https://doi.org/10.3390/rs18121908

