Wheat Head Detection in Field Environments Based on an Improved YOLOv11 Model
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Dataset Construction
2.3. Experimental Environment and Configuration
2.4. YOLO v11n Network Architecture
2.5. YOLO v11n Model Optimization
2.5.1. Global Edge Information Transfer Module Architecture
2.5.2. RFCAConv-Based C3k2_RFCAConv Module
2.5.3. Normalized Wasserstein Distance Loss Function
2.5.4. YOLO v11n-GRN Network Model
3. Results and Analysis
3.1. Evaluation Index
3.2. Model Performance Comparative Experiments
3.3. Ablation Experiment
3.4. Mean Average Precision Comparative Analysis
3.5. Model Detection Performance Comparison
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Food and Agriculture Organization of the United Nations. Available online: https://www.fao.org/faostat/zh/#data/QCL/visualize (accessed on 15 August 2025).
- Li, R.; Wu, Y. Improved YOLO v5 Wheat Ear Detection Algorithm Based on Attention Mechanism. Electronics 2022, 11, 1673. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
- David, E.; Serouart, M.; Smith, D.; Madec, S.; Velumani, K.; Liu, S.; Wang, X.; Espinosa, F.P.; Shafiee, S.; Tahir, I.S.; et al. Global wheat head dataset 2021: More diversity to improve the benchmarking of wheat head localization methods. arXiv 2021, arXiv:2105.07660. [Google Scholar] [CrossRef]
- Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. WheatNet-Lite: A novel light weight network for wheat head detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 1–17 October 2021; pp. 1332–1341. [Google Scholar]
- Tan, M.; Le, Q.V. Mixconv: Mixed depthwise convolutional kernels. arXiv 2019, arXiv:1907.09595. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Khaki, S.; Safaei, N.; Pham, H.; Wang, L. Wheatnet: A lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. arXiv 2021, arXiv:2103.09408. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Zang, H.; Peng, Y.; Zhou, M.; Li, G.; Zheng, G.; Shen, H. Automatic detection and counting of wheat spike based on DMseg-Count. Sci. Rep. 2024, 14, 29676. [Google Scholar] [CrossRef] [PubMed]
- Wang, B.; Liu, H.; Samaras, D.; Nguyen, M.H. Distribution matching for crowd counting. Adv. Neural Inf. Process. Syst. 2020, 33, 1595–1607. [Google Scholar]
- Fang, C.; Yang, X. Lightweight YOLOv8 for wheat head detection. IEEE Access 2024, 12, 66214–66222. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Li, L.; Hassan, M.A.; Wang, D.; Wan, G.; Beegum, S.; Rasheed, A.; Xia, X.; He, Y.; Zhang, Y.; He, Z. RGB imaging and computer vision-based approaches for identifying spike number loci for wheat. Plant Phenomics 2025, 7, 100051. [Google Scholar] [CrossRef]
- Pan, J.; Song, S.; Guan, Y.; Jia, W. Improved Wheat Detection Based on RT-DETR Model. Int. J. Comput. Sci. 2025, 52, 705. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Xiong, Z.; Zhan, Z.; Wang, X. Position-sensitive attention based on fully convolutional neural networks for land cover classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 3, 281–288. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Samuel, A.; Sun, G.; Enhua, W. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2011–2023. [Google Scholar]
- Sobel, I.; Feldman, G. A 3 × 3 Isotropic Gradient Operator For Image Processing. Presented at a talk at the Stanford Artificial Intelligence Project. 1968, pp. 271–272. Available online: https://www.researchgate.net/publication/285159837_A_33_isotropic_gradient_operator_for_image_processing (accessed on 15 August 2025).
- Scherer, D.; Müller, A.; Behnke, S. Evaluation of pooling operations in convolutional architectures for object recognition. In Proceedings of the International Conference on Artificial Neural Networks, Thessaloniki, Greece, 15–18 September 2010; pp. 92–101. [Google Scholar]
- Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating spatial attention and standard convolutional operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Network Model | Precision/% | Recall/% | mAP@0.5/% | Parameters/MB | GFLOPs | FPS |
---|---|---|---|---|---|---|
Faster R-CNN | 51.4 | 72.0 | 58.9 | 137.1 | 370.2 | 18.8 |
RT-DETR | 91.3 | 89.4 | 94.2 | 32.8 | 103.4 | 74.2 |
YOLO X | 91.2 | 88.0 | 93.6 | 8.9 | 26.8 | 70.6 |
YOLO v5n | 90.7 | 89.0 | 93.9 | 2.6 | 5.8 | 72.1 |
YOLO v8n | 91.0 | 89.3 | 93.9 | 3.0 | 8.1 | 77.0 |
YOLO v11n | 91.2 | 88.8 | 94.1 | 2.6 | 6.4 | 73.2 |
YOLO v11s | 91.0 | 90.6 | 94.3 | 9.4 | 21.3 | 88.2 |
YOLO v12n | 91.3 | 88.9 | 94.0 | 2.5 | 5.8 | 67.6 |
YOLO v11n-GRN | 92.5 | 91.1 | 95.7 | 3.6 | 10.3 | 61.6 |
Network Model | Precision/% | Recall/% | mAP@0.5/% |
---|---|---|---|
Faster R-CNN | 56.9 | 53.0 | 53.4 |
RT-DETR | 99.1 | 88.7 | 93.8 |
YOLO X | 90.3 | 87.0 | 92.7 |
YOLO v5n | 90.0 | 87.3 | 92.8 |
YOLO v8n | 90.8 | 88.5 | 93.6 |
YOLO v11n | 90.8 | 88.6 | 93.7 |
YOLO v11s | 90.7 | 89.0 | 94.0 |
YOLO v12n | 90.9 | 88.2 | 93.7 |
YOLO v11n-GRN | 91.6 | 89.7 | 94.4 |
Model | A | B | C | Precision/% | Recall/% | mAP@0.5/% | APS/% | APM/% | APL/% | Parameters/MB | GFLOPs | FPS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | × | × | × | 91.2 | 88.8 | 94.1 | 6.0 | 41.5 | 51.4 | 2.6 | 6.4 | 73.2 |
2 | √ | × | × | 91.2 | 90.8 | 94.8 | 6.3 | 47.4 | 53.4 | 3.4 | 9.6 | 64.9 |
3 | × | √ | × | 92.2 | 89.3 | 94.5 | 6.0 | 47.3 | 53.2 | 2.7 | 6.9 | 63.7 |
4 | × | × | √ | 91.8 | 89.2 | 94.4 | 6.7 | 47.8 | 53.9 | 2.6 | 6.4 | 75.0 |
5 | √ | √ | × | 92.1 | 90.7 | 95.3 | 5.7 | 47.7 | 53.8 | 3.6 | 10.3 | 62.2 |
6 | √ | × | √ | 91.8 | 90.4 | 95.0 | 7.7 | 47.7 | 54.1 | 3.4 | 9.6 | 64.7 |
7 | × | √ | √ | 92.2 | 90.3 | 94.9 | 6.9 | 48.0 | 53.9 | 2.7 | 6.9 | 64.1 |
8 | √ | √ | √ | 92.5 | 91.1 | 95.7 | 8.0 | 48.2 | 54.3 | 3.6 | 10.3 | 61.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Liu, Z.; Guo, X.; Li, C.; Teng, G. Wheat Head Detection in Field Environments Based on an Improved YOLOv11 Model. Agriculture 2025, 15, 1765. https://doi.org/10.3390/agriculture15161765
Zhang Y, Liu Z, Guo X, Li C, Teng G. Wheat Head Detection in Field Environments Based on an Improved YOLOv11 Model. Agriculture. 2025; 15(16):1765. https://doi.org/10.3390/agriculture15161765
Chicago/Turabian StyleZhang, Yuting, Zihang Liu, Xiangdong Guo, Congcong Li, and Guifa Teng. 2025. "Wheat Head Detection in Field Environments Based on an Improved YOLOv11 Model" Agriculture 15, no. 16: 1765. https://doi.org/10.3390/agriculture15161765
APA StyleZhang, Y., Liu, Z., Guo, X., Li, C., & Teng, G. (2025). Wheat Head Detection in Field Environments Based on an Improved YOLOv11 Model. Agriculture, 15(16), 1765. https://doi.org/10.3390/agriculture15161765