Research on Field Weed Target Detection Algorithm Based on Deep Learning
Abstract
1. Introduction
2. Related Work
2.1. Simplified Attention Convolution
2.2. Spatial Pyramid Pooling
2.3. Feedforward Network
3. Method
3.1. Overall Framework
3.2. Spatial Channel Conv Block Module
3.3. Spatial Pyramid Pooling Fast_Edge Gaussian Aggregation Super Module
3.4. Efficient Multi-Scale Spatial-Feedforward Network Module
4. Experiment
4.1. Dataset
4.2. Environment Setup
4.3. Attention Comparison Method
4.4. Comparison Method
4.5. Evaluation Index
4.6. Ablation Experiment
4.7. Comparative Experiment
4.8. Generalization Experiment
4.9. Visualization Analysis
4.10. Experimental Results and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jocher, G. YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 9 December 2025).
- Jocher, G. YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 9 December 2025).
- Zhao, Y.; Zhang, W.; Zhang, L.; Tang, X.; Wang, D.; Zhang, Q.; Li, P. Development of an enhanced hybrid attention YOLOv8s small object detection method for phenotypic analysis of root nodules. Artif. Intell. Agric. 2025, 16, 12–43. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
- Zhao, J.; Wei, F.; Xu, C. Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Tian, Y.; Xie, L.; Wang, Z.; Wei, L.; Zhang, X.; Jiao, J.; Wang, Y.; Tian, Q.; Ye, Q. Integrally Pre-Trained Transformer Pyramid Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Tian, L.; Zhang, H.; Liu, B.; Zhang, J.; Duan, N.; Yuan, A.; Huo, Y. VMF-SSD: A Novel V-Space Based Multi-Scale Feature Fusion SSD for Apple Leaf Disease Detection. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 2016–2028. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Xia, C.; Lv, F.; Shi, Y. RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; pp. 1628–1636. [Google Scholar] [CrossRef]
- Li, Y.; Liu, C.; Lou, Y.; Li, J.; Wang, P. Integrating colored LiDAR and YOLO semantic segmentation for design feature extraction in Chinese ancient architecture. npj Herit. Sci. 2025, 13, 316. [Google Scholar] [CrossRef]
- Huang, Y.; Cheng, Y.; Wang, K. Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, 11–15 June 2025; pp. 34483–34490. [Google Scholar]
- Gu, Y.; Meng, Y.; Ji, J.; Sun, X. ACL: Activating Capability of Linear Attention for Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, 11–15 June 2025; pp. 34609–34618. [Google Scholar]
- Li, K.; Wang, D.; Hu, Z.; Zhu, W.; Li, S.; Wang, Q. Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 31673–31681. [Google Scholar]
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Pan, J.; Cao, J.; Xing, S.; Dai, M.; Liu, J.; Wang, X.; Zhang, Y.; Huang, G. An aerial point cloud classification using point transformer via multi-feature fusion. Sci. Rep. 2025, 15, 22924. [Google Scholar] [CrossRef] [PubMed]
- Zhu, G.; Lin, G.; Yang, X.; Zeng, C. Flat U-Net: An Efficient Ultralightweight Model for Solar Filament Segmentation in Full-disk Hα Images. arXiv 2025, arXiv:2502.07259. [Google Scholar] [CrossRef]
- Lu, W.; Chen, S.B.; Li, H.D.; Shu, Q.L.; Ding, C.H.Q.; Tang, J.; Luo, B. LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection. arXiv 2025, arXiv:2503.14012. [Google Scholar]
- Chen, S.; Zhang, H.; Atapour-Abarghouei, A.; Shum, H.P.H. SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM. arXiv 2024, arXiv:2411.06318. [Google Scholar] [CrossRef]
- Liu, R.; Chen, W.; Zhang, X. Dynamic convolution models for cross-frontend keyword spotting. Nat. Commun. 2025, 16, 10304. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Zheng, Z.; Shao, J.; Duan, Y. Adaptive Rectangular Convolution for Remote Sensing Pansharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, 11–15 June 2025; pp. 33578–33588. [Google Scholar]
- Chen, L.; Gu, L.; Li, L.; Yan, C.; Fu, Y. Frequency Dynamic Convolution for Dense Image Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, 11–15 June 2025; pp. 33112–33121. [Google Scholar]
- Ma, L.; Pal, S.; Zhang, Y.; Zhou, J.; Zhang, Y.; Coates, M. CKGConv: General Graph Convolution with Continuous Kernels. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024; pp. 33902–33924. [Google Scholar]
- Finder, S.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2024; pp. 1059–1066. [Google Scholar]
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 8388–8398. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 22602–22611. [Google Scholar]
- Jiang, F.; Wei, H.; Jing, Y.; Meng, L. MRANet: An Encoder-Decoder Network with Multi-Scale Residual Atrous-Spatial Pyramid Pooling for Seismic Phase Picking. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Mao, G.; Liang, H.; Yao, Y.; Wang, L.; Zhang, N. ESPPNet: An Efficient Progressive Spatial Pyramid Pooling Network for Real-Time Traffic Object Detection. IEEE Trans. Autom. Sci. Eng. 2025, 22, 14048–14061. [Google Scholar] [CrossRef]
- Zhang, X.; Xiao, Z.; Wu, X.; Chen, Y.; Zhao, J.; Hu, Y.; Liu, J. Pyramid Pixel Context Adaption Network for Medical Image Classification With Supervised Contrastive Learning. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 6802–6815. [Google Scholar] [CrossRef] [PubMed]
- Elflein, S.; Zhou, Q.; Leal-Taixé, L. Light3R-SfM: Towards Feed-forward Structure-from-Motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 32660–32668. [Google Scholar]
- Wang, R.; Prada, F.; Wang, Z.; Jiang, Z.; Yin, C. FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 32467–32476. [Google Scholar]
- Lee, S.; Chung, J.; Kim, K.; Huh, J.; Lee, G.; Lee, M.; Lee, K.M. OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 33734–33742. [Google Scholar]
- Zhang, S.; Wang, J.; Xu, Y.; Xue, N.; Rupprecht, C.; Zhou, X. FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 34618–34627. [Google Scholar]
- Ni, K.; Xie, Y.; Zhao, G.; Zheng, Z.; Wang, P.; Lu, T. Coarse-to-Fine High-Order Network for Hyperspectral and LiDAR Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5509716. [Google Scholar] [CrossRef]






| Configuration | vivoZ5 | HP Shadow Sprite 8 (PC) | Raspberry Pi 4B |
|---|---|---|---|
| CPU | Qualcomm Snapdragon 712; 6 GB + 256 GB | i5-12500H 16 G | ARM Cortex-A72 1.5 GHz (quad-core) SD Card (32 GB) |
| GPU | - | Nvidia RTX 3050Ti 4 G | 500 MHz VideoCore VI |
| Camera | Posterior 48 megapixels; Resolution 4096 × 3027 | - | CSI camera; 8 megapixels; Video 1080p30 resolution |
| Operating system | FuntouchOS 9 | Windows 11 | Linux 2014 64 |
| Accelerated environment | - | CUDA 11.7 | - |
| Library | - | Torch 1.12.0; Torchvision 0.13.0 | Torch; Opencv |
| Model | P% | R% | Params/M | Flops/G | mAP 50% | mAP50-95% |
|---|---|---|---|---|---|---|
| YOLOv9t | 71.4 | 63.3 | 1.7 | 6.5 | 68.8 | 45.2 |
| YOLOv9t + SCB | 78.8 | 69.9 | 1.7 | 25.8 | 75.8 | 52.0 |
| YOLOv9t + SE | 73.8 | 65.6 | 3.1 | 12.1 | 70.3 | 47.1 |
| YOLOv9t + CBAM | 76.6 | 65.9 | 3.0 | 12.7 | 70.8 | 46.9 |
| YOLOv9t + CA | 77.2 | 67.0 | 3.0 | 12.2 | 71.2 | 48.1 |
| Model | P% | R% | Params/M | Flops/G | mAP50% | mAP50-95% |
|---|---|---|---|---|---|---|
| YOLOv5n | 92.2 | 88.3 | 2.1 | 5.9 | 91.7 | 85.9 |
| YOLOv8n | 92.3 | 85.8 | 2.7 | 6.9 | 93.0 | 87.9 |
| YOLOv9t | 92.7 | 87.0 | 1.7 | 6.5 | 93.0 | 87.7 |
| YOLOv9s | 94.4 | 87.4 | 6.2 | 22.3 | 94.0 | 89.0 |
| YOLOv10n | 92.8 | 88.3 | 2.7 | 8.4 | 93.4 | 88.3 |
| YOLOv10s | 94.2 | 89.1 | 8.0 | 24.6 | 94.1 | 89.5 |
| YOLOv11n | 93.3 | 86.7 | 2.6 | 6.4 | 92.8 | 87.8 |
| YOLOv11s | 94.7 | 88.8 | 9.4 | 21.5 | 94.6 | 89.3 |
| YOLOv12n | 93.4 | 88.6 | 2.5 | 6.3 | 94.0 | 88.4 |
| SSS-YOLO | 95.5 | 89.3 | 1.9 | 28.4 | 95.3 | 91.2 |
| Model | P% | R% | Params/M | Flops/G | mAP 50% | mAP50-95% |
|---|---|---|---|---|---|---|
| YOLOv9t | 71.4 | 63.3 | 1.7 | 6.5 | 68.8 | 45.2 |
| YOLOv9t + SCB | 78.8 | 69.9 | 1.7 | 25.8 | 75.8 | 52.0 |
| YOLOv9t + SPPF_EGAS | 77.0 | 64.5 | 1.7 | 6.6 | 71.5 | 47.2 |
| YOLOv9t + EMSN | 77.0 | 64.5 | 1.8 | 7.2 | 72.5 | 48.3 |
| YOLOv9t + SPPF_EGAS + EMSN | 76.1 | 67.6 | 1.9 | 7.2 | 73.8 | 48.9 |
| SSS-YOLO | 79.2 | 69.8 | 1.9 | 28.7 | 77.4 | 52.7 |
| Model | P% | R% | Params/M | Flops/G | FPS | mAP50% | mAP50-95% |
|---|---|---|---|---|---|---|---|
| YOLOv5n | 75.7 | 64.2 | 2.1 | 5.9 | 417 | 70.4 | 45.4 |
| YOLOv8n | 76.2 | 65.9 | 2.7 | 6.9 | 366 | 71.4 | 46.7 |
| YOLOv9t | 71.4 | 63.3 | 1.7 | 6.5 | 302 | 68.8 | 45.2 |
| YOLOv9s | 76.2 | 66.7 | 6.2 | 22.3 | 239 | 73.0 | 49.6 |
| YOLOv9m | 81.8 | 66.4 | 16.6 | 60.4 | 183 | 75.3 | 52.0 |
| YOLOv10n | 73.3 | 62.3 | 2.7 | 8.4 | 289 | 68.1 | 45.6 |
| YOLOv10s | 77.0 | 62.8 | 8.0 | 24.6 | 177 | 70.1 | 47.6 |
| YOLOv10m | 78.4 | 64.9 | 16.5 | 63.8 | 109 | 72.6 | 50.0 |
| YOLOv11n | 73.5 | 66.8 | 2.6 | 6.3 | 376 | 71.0 | 46.5 |
| YOLOv11s | 81.1 | 64.6 | 9.4 | 21.5 | 189 | 73.3 | 49.4 |
| YOLOv11m | 82.1 | 65.0 | 20.1 | 68.1 | 104 | 74.6 | 51.1 |
| YOLOv12n | 78.9 | 64.0 | 2.5 | 6.3 | 401 | 72.5 | 47.6 |
| SSS-YOLO | 79.2 | 69.8 | 1.9 | 28.7 | 297 | 77.4 | 52.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, Z.; Wu, L.; Jia, Z.; Wang, J.; Zhou, G.; Zhang, Z. Research on Field Weed Target Detection Algorithm Based on Deep Learning. Sensors 2026, 26, 677. https://doi.org/10.3390/s26020677
Chen Z, Wu L, Jia Z, Wang J, Zhou G, Zhang Z. Research on Field Weed Target Detection Algorithm Based on Deep Learning. Sensors. 2026; 26(2):677. https://doi.org/10.3390/s26020677
Chicago/Turabian StyleChen, Ziyang, Le Wu, Zhenhong Jia, Jiajia Wang, Gang Zhou, and Zhensen Zhang. 2026. "Research on Field Weed Target Detection Algorithm Based on Deep Learning" Sensors 26, no. 2: 677. https://doi.org/10.3390/s26020677
APA StyleChen, Z., Wu, L., Jia, Z., Wang, J., Zhou, G., & Zhang, Z. (2026). Research on Field Weed Target Detection Algorithm Based on Deep Learning. Sensors, 26(2), 677. https://doi.org/10.3390/s26020677

