DFA-YOLO: Deformable Spatial Attention and Hierarchical Fusion for Robust Object Detection in Adverse Weather
Abstract
1. Introduction
- (1)
- The C3k2-DSA module is proposed by integrating deformable spatial attention into the original C3k2 bottleneck. It is designed to dynamically adjust the sampling and response regions, enabling the backbone network to adaptively extract features in complex visual scenes.
- (2)
- A Hierarchical Multi-Scale Fusion Module (HMFM) is designed, integrating hierarchical multi-scale channel attention with enhanced spatial attention and incorporating a global context-aware mechanism. This design is aimed at effectively improving the model’s feature discrimination in complex environments characterized by low visibility and fog interference.
- (3)
- The Small-Target Wasserstein-Adaptive WIoU Loss is proposed. Building upon the classical Wasserstein distance, this loss function incorporates small object adaptive weighting and a dynamic gradient adjustment mechanism based on Wise-IoU (WIoU). It is optimized to enhance the model’s localization robustness for small objects under poor visibility conditions, while ensuring numerical stability to maintain reliable training.
2. Related Work
2.1. Object Detection in Adverse Weather
2.2. Attention Mechanism
3. Methodology
3.1. C3k2-DSA Module
3.2. Hierarchical Multi-Scale Fusion Module
3.3. Small-Target Wasserstein-Adaptive WIoU Loss
4. Experiments and Results
4.1. Experimental Datasets
4.2. Experimental Environment and Evaluation Metrics
4.3. Ablation Experiment
4.4. Compared with Other SOTA Models
4.5. Visual Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lu, J.; Zheng, W.; Qian, Y.; Liang, X.; Wang, K.; Yuan, Z. Illumination-adaptive Feature Enhancement for Low-light Object Detection. Pattern Recognit. 2026, 176, 113122. [Google Scholar] [CrossRef]
- Su, H.; Liu, L.; Wang, Z.; Gao, M. Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing. Image Vis. Comput. 2024, 150, 105212. [Google Scholar] [CrossRef]
- Zhang, Y.; Carballo, A.; Yang, H.; Takeda, K. Perception and sensing for autonomous vehicles under adverse weather conditions: A survey. ISPRS J. Photogramm. Remote Sens. 2023, 196, 146–177. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, Z.; Su, Q.; Yang, K.; Wu, Y.; He, L.; Tang, X. Object detection for autonomous vehicles under adverse weather conditions. Expert Syst. Appl. 2025, 296, 128994. [Google Scholar] [CrossRef]
- Li, F.; Lu, Y.; Zhao, M.; Wu, W. BNE-DETR: Nighttime Pedestrian Detection with Visible Light Sensors via Feature Enhancement and Multi-Scale Fusion. Sensors 2025, 26, 260. [Google Scholar] [CrossRef]
- Fu, X.; Huang, J.; Zeng, D.; Huang, Y.; Ding, X.; Paisley, J. Removing rain from single images via a deep detail network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 3855–3863. [Google Scholar]
- Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 4770–4778. [Google Scholar]
- Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI: Washington, DC, USA, 2020; pp. 11908–11915. [Google Scholar]
- Ling, P.; Chen, H.; Tan, X.; Jin, Y.; Chen, E. Single image dehazing using saturation line prior. IEEE Trans. Image Process. 2023, 32, 3238–3253. [Google Scholar] [CrossRef]
- Liu, Y.; Li, T.; Tan, C.; Ren, W.; Ancuti, C.; Lin, W. IHDCP: Single Image Dehazing Using Inverted Haze Density Correction Prior. IEEE Trans. Image Process. 2026, 35, 1448–1461. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.-C.; Le, T.-H.; Jaw, D.-W. DSNet: Joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2623–2633. [Google Scholar] [CrossRef]
- Wang, Y.; Yan, X.; Zhang, K.; Gong, L.; Xie, H.; Wang, F.L.; Wei, M. Togethernet: Bridging image restoration and object detection together via dynamic enhancement learning. Comput. Graph. Forum 2023, 41, 465–476. [Google Scholar] [CrossRef]
- Li, C.; Zhou, H.; Liu, Y.; Yang, C.; Xie, Y.; Li, Z.; Zhu, L. Detection-friendly dehazing: Object detection in real-world hazy scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 8284–8295. [Google Scholar] [CrossRef]
- Li, G.; Ji, Z.; Qu, X.; Zhou, R.; Cao, D. Cross-domain object detection for autonomous driving: A stepwise domain adaptative YOLO approach. IEEE Trans. Intell. Veh. 2022, 7, 603–615. [Google Scholar] [CrossRef]
- Guo, Y.; Yu, H.; Xie, S.; Ma, L.; Cao, X.; Luo, X. Dsca: A dual semantic correlation alignment method for domain adaptation object detection. Pattern Recognit. 2024, 150, 110329. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Sun, Y.; Dai, D.; Zhang, Q.; Wang, Y.; Xu, S.; Lian, C. MSCA-Net: Multi-scale contextual attention network for skin lesion segmentation. Pattern Recognit. 2023, 139, 109524. [Google Scholar] [CrossRef]
- Yu, Y.; Zhang, Y.; Cheng, Z.; Song, Z.; Tang, C. MCA: Multidimensional collaborative attention in deep convolutional neural networks for image recognition. Eng. Appl. Artif. Intell. 2023, 126, 107079. [Google Scholar] [CrossRef]
- Kong, L.; Dong, J.; Ge, J.; Li, M.; Pan, J. Efficient frequency domain-based transformers for high-quality image deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–23 June 2023; pp. 5886–5895. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar]
- Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Yu, Z.; Zhao, L.; Zheng, J.; Chen, H.; Zhang, X. Deformable Spatial Attention Networks: Enhancing Lightweight Convolutional Models for Vision Tasks. TechRxiv 2025, 11, 9. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI: Washington, DC, USA, 2020; pp. 12993–13000. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef]
- Loh, Y.P.; Chan, C.S. Getting to know low-light images with the exclusively dark dataset. Comput. Vis. Image Underst. 2019, 178, 30–42. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 2980–2988. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO. Available online: https://github.com/ultralytics/ultralytics (accessed on 2 May 2025).
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Feng, Y.; Huang, J.; Du, S.; Ying, S.; Yong, J.-H.; Li, Y.; Ding, G.; Ji, R.; Gao, Y. Hyper-yolo: When visual object detection meets hypergraph computation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 2388–2401. [Google Scholar] [CrossRef]
- Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-adaptive YOLO for object detection in adverse weather conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Pomona, CA, USA, 24–28 October 2022; AAAI: Washington, DC, USA, 2022; pp. 1792–1800. [Google Scholar]
- Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 1780–1789. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 618–626. [Google Scholar]











| Parameters | Configuration |
|---|---|
| CPU | 15 vCPU Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80 GHz |
| GPU | RTX 3090 (24 GB) |
| System environment | Ubuntu 20.04 |
| Framework | PyTorch 2.0 |
| Programming voice | Python 3.8.10 |
| Cuda | 11.8 |
| Base | C3k2-DSA | HMFM | SWAWIoU | mAP@50 | mAP@50-95 | P | R | GFLOPs | Params (M) | Time (ms) |
|---|---|---|---|---|---|---|---|---|---|---|
| ✔ | 72.3 | 48 | 77.3 | 62.8 | 6.3 | 2.58 | 7.9 | |||
| ✔ | ✔ | 73.1 | 48.3 | 76.3 | 63.6 | 6.3 | 2.54 | 9.2 | ||
| ✔ | ✔ | 73.8 | 48.7 | 79.3 | 63.6 | 6.8 | 2.69 | 8.0 | ||
| ✔ | ✔ | 73.5 | 48.6 | 76.3 | 63.2 | 6.3 | 2.58 | 7.5 | ||
| ✔ | ✔ | ✔ | 74.3 | 49.3 | 79.6 | 65.9 | 6.6 | 2.68 | 10.2 | |
| ✔ | ✔ | ✔ | ✔ | 75.7 | 50.8 | 80.3 | 67.3 | 6.6 | 2.68 | 10.0 |
| Base | C3k2-DSA | HMFM | SWAWIoU | mAP@50 | mAP@50-95 | P | R | GFLOPs | Params (M) | Time (ms) |
|---|---|---|---|---|---|---|---|---|---|---|
| ✔ | 56 | 34.4 | 65.4 | 50.8 | 6.3 | 2.58 | 5.2 | |||
| ✔ | ✔ | 56.5 | 34.5 | 66.1 | 51.4 | 6.3 | 2.54 | 6.4 | ||
| ✔ | ✔ | 58.3 | 35.9 | 68.9 | 52.6 | 6.8 | 2.69 | 6.1 | ||
| ✔ | ✔ | 58.1 | 35.5 | 72.4 | 51.6 | 6.3 | 2.58 | 4.9 | ||
| ✔ | ✔ | ✔ | 58.6 | 35.7 | 69.2 | 52.1 | 6.6 | 2.68 | 6.8 | |
| ✔ | ✔ | ✔ | ✔ | 59.6 | 36.4 | 71.9 | 53.6 | 6.6 | 2.68 | 6.6 |
| Evaluation Index | CIoU | EIoU | SIoU | WIoU | ShapeIoU | MPDIoU | NWD | Ours |
|---|---|---|---|---|---|---|---|---|
| mAP50 | 74.3 | 74 | 73.8 | 74.9 | 74.2 | 75.2 | 73.9 | 75.7 |
| mAP50-95 | 49.3 | 49.2 | 48.9 | 50 | 48.1 | 49.9 | 49.7 | 50.8 |
| Model | P | R | mAP50 | mAP50-95 | GFLOPs |
|---|---|---|---|---|---|
| SSD | 77.9 | 48.4 | 63.1 | 37.4 | 30.63 |
| Faster R-CNN | 77.4 | 57.3 | 69.3 | 43.7 | 37.54 |
| AOD+Faster R-CNN | 77.5 | 57.2 | 67.6 | 43.1 | 37.54 |
| IA-YOLO | 51.1 | 36.3 | 38.2 | 21.7 | / |
| RetinaNet | 67.2 | 62.1 | 64.8 | 38.4 | 170.1 |
| RT-DETR | 71.0 | 59.6 | 65.7 | 43.0 | 103.5 |
| Deformable DETR | 72.8 | 64.7 | 71.8 | 47.2 | 160 |
| YOLOv5 | 74.9 | 64.3 | 72.6 | 48.1 | 7.1 |
| YOLOv8 | 78.1 | 66.4 | 72.3 | 48.8 | 8.1 |
| YOLOv10 | 80.5 | 60.8 | 72.4 | 47.3 | 6.5 |
| YOLOv11 | 77.3 | 62.8 | 72.3 | 48.0 | 6.3 |
| hyper-YOLO | 78.9 | 67.9 | 73.7 | 49.6 | 9.7 |
| Ours | 80.3 | 67.3 | 75.7 | 50.8 | 6.6 |
| Model | P | R | mAP50 | mAP50-95 | GFLOPs |
|---|---|---|---|---|---|
| SSD | 56.9 | 38.4 | 45.1 | 23.4 | 30.63 |
| Faster R-CNN | 65.4 | 57.3 | 54.3 | 30.7 | 37.54 |
| Zero-DCE+Faster R-CNN | 65.1 | 56.9 | 54.1 | 30.1 | 37.54 |
| IA-YOLO | 56.6 | 49.2 | 40.7 | 23.6 | / |
| RetinaNet | 58.2 | 50.1 | 48.4 | 25.3 | 170.1 |
| RT-DETR | 67.8 | 52.6 | 58.2 | 36.4 | 103.5 |
| Deformable DETR | 66.4 | 50.1 | 56.3 | 34.6 | 160 |
| YOLOv5 | 65.6 | 51.5 | 52.4 | 28 | 7.1 |
| YOLOv8 | 69.8 | 51.1 | 58.4 | 35.4 | 8.1 |
| YOLOv10 | 67.6 | 50.2 | 55.6 | 33.8 | 6.5 |
| YOLOv11 | 65.4 | 50.8 | 56.0 | 34.4 | 6.3 |
| hyper-YOLO | 69.8 | 53.8 | 59.9 | 36.6 | 9.7 |
| Ours | 71.9 | 53.6 | 59.6 | 36.4 | 6.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xie, L.; Cheng, L. DFA-YOLO: Deformable Spatial Attention and Hierarchical Fusion for Robust Object Detection in Adverse Weather. Sensors 2026, 26, 2229. https://doi.org/10.3390/s26072229
Xie L, Cheng L. DFA-YOLO: Deformable Spatial Attention and Hierarchical Fusion for Robust Object Detection in Adverse Weather. Sensors. 2026; 26(7):2229. https://doi.org/10.3390/s26072229
Chicago/Turabian StyleXie, Lu, and Liwen Cheng. 2026. "DFA-YOLO: Deformable Spatial Attention and Hierarchical Fusion for Robust Object Detection in Adverse Weather" Sensors 26, no. 7: 2229. https://doi.org/10.3390/s26072229
APA StyleXie, L., & Cheng, L. (2026). DFA-YOLO: Deformable Spatial Attention and Hierarchical Fusion for Robust Object Detection in Adverse Weather. Sensors, 26(7), 2229. https://doi.org/10.3390/s26072229

