Wampee-YOLO: A High-Precision Detection Model for Dense Clustered Wampee in Natural Orchard Scenario
Abstract
1. Introduction
- Construction of a dedicated wampee detection dataset for natural Scenario: We constructed a specialized wampee detection dataset tailored for natural orchard scenario. This dataset comprises wampee images covering different maturity levels (colors), shooting perspectives, lighting conditions, and occlusion scenarios. High-quality manual annotations were performed, providing a crucial benchmark data resource for intelligent wampee detection research.
- Enhanced multi-scale feature extraction: To address the scale variability and complex background interference of wampee fruits in natural scenes, we integrated the re-parameterized high-efficiency multi-scale attention convolution (RFEMAConv) based on the C3k2 module, forming the C3k2-RFEMAConv fundamental unit. This module enhances the robustness of the backbone network in extracting discriminative multi-scale fruit features from complex environments by expanding the effective receptive field and augmenting contextual information compensation.
- Optimized small target feature localization: We introduced two improvements targeting the precision of small target feature representation. The AIFI module was used to enhance the original SPPF structure for better small target localization and recognition capabilities. Furthermore, a Triplet Attention mechanism was integrated at the end of the backbone to facilitate multi-dimensional feature interaction, thus strengthening the network’s ability to express and distinguish small target features.
- Occlusion interference suppression: Targeting the feature blurring and localization difficulties caused by dense inter-fruit occlusion and leaf occlusion, we innovatively designed the C2PSA-MSCADYT module featuring dual parallel paths. The Multi-Scale Coordinate Attention (MSCA) branch of this module captures and fuses position-aware features across multiple spatial granularities, significantly enhancing the capability to distinguish and localize occluded fruit contours. Simultaneously, its Dynamic Tanh (DYT) activation branch optimizes the gradient flow and feature representation by adaptively adjusting the non-linear response, thereby enhancing the network’s learning stability and feature discrimination under complex occlusion.
- Progressive adaptive feature pyramid (AFPN-Pro2345): The AFPN-Pro2345 feature pyramid improves the efficiency and effectiveness of multi-scale feature fusion by directly merging non-adjacent hierarchical layers. This mechanism protects the integrity of deep semantic features and shallow detail features during propagation, avoiding information loss caused by multiple sampling steps. Furthermore, it utilizes four feature layers of different scales from the backbone network to construct the feature pyramid, thereby helping the detection head accurately locate and identify objects of varying sizes.
2. Materials and Methods
2.1. Dataset
2.1.1. Data Acquiring
2.1.2. Data Labeling and Augmentation
2.2. Wampee-YOLO
2.2.1. Architecture
2.2.2. C3k2-RFEMAConv
2.2.3. C2PSA-MSCADYT
2.2.4. AFPN-Pro2345
2.3. Experimental Settings
2.3.1. Experimental Environment and Training Settings
2.3.2. Evaluation Metrics
3. Experimental Results and Analysis
3.1. Ablation Study
3.2. Comparison Experiments
3.2.1. Comparison of Different Attention Modules
3.2.2. Comparison of Different Detection Models
3.3. Visualization
4. Discussion
4.1. Advantages
4.2. Limitations and Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, S.; Liu, Z.; Zhao, M.; Gao, C.; Wang, J.; Li, C.; Dong, X.; Liu, Z.; Zhou, D. Chitosan-wampee seed essential oil composite film combined with cold plasma for refrigerated storage with modified atmosphere packaging: A promising technology for quality preservation of golden pompano fillets. Int. J. Biol. Macromol. 2023, 224, 1266–1275. [Google Scholar] [CrossRef] [PubMed]
- Mo, X.; Cai, D.; Yang, H.; Chen, Q.; Xu, C.; Wang, J.; Tong, Z.; Xu, B. Changes in fruit quality parameters and volatile compounds in four wampee varieties at different ripening stages. Food Chem. X 2025, 27, 102377. [Google Scholar] [CrossRef]
- Chang, X.; Ye, Y.; Pan, J.; Lin, Z.; Qiu, J.; Guo, X.; Lu, Y. Comparative assessment of phytochemical profiles and antioxidant activities in selected five varieties of wampee (Clausena lansium) fruits. Int. J. Food Sci. Technol. 2018, 53, 2680–2686. [Google Scholar] [CrossRef]
- Liu, S.; Whitty, M.; Cossell, S. Automatic grape bunch detection in vineyards for precise yield estimation. In Proceedings of the 14th IAPR International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 18–22 May 2015. [Google Scholar]
- Fu, L.; Duan, J.; Zou, X.; Lin, G.; Song, S.; Ji, B.; Yang, Z. Banana detection based on color and texture features in the natural environment. Comput. Electron. Agric. 2019, 167, 105057. [Google Scholar] [CrossRef]
- Guo, Q.; Chen, Y.; Tang, Y.; Zhuang, J.; He, Y.; Hou, C.; Chu, X.; Zhong, Z.; Luo, S. Lychee fruit detection based on monocular machine vision in orchard environment. Sensors 2019, 19, 4091. [Google Scholar] [CrossRef]
- Yu, L.; Xiong, J.; Fang, X.; Yang, Z.; Chen, Y.; Lin, X.; Chen, S. A litchi fruit recognition method in a natural environment using RGB-D images. Biosyst. Eng. 2021, 204, 50–63. [Google Scholar] [CrossRef]
- Xiao, F.; Wang, H.; Li, Y.; Cao, Y.; Lv, X.; Xu, G. Object detection and recognition techniques based on digital image processing and traditional machine learning for fruit and vegetable harvesting robots: An overview and review. Agronomy 2023, 13, 639. [Google Scholar] [CrossRef]
- Shi, Y.; Lian, S.; Siyao, Z. Recognition method of pheasant using enhanced Tiny-YOLOV3 model. Trans. Chin. Soc. Agric. Eng. 2020, 13, 141–147. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 16th IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Sapkota, R.; Karkee, M. Ultralytics YOLO evolution: An overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 object detectors for computer vision and pattern recognition. arXiv 2025, arXiv:2510.09653. [Google Scholar]
- Zhang, M.; Ye, S.; Zhao, S.; Wang, W.; Xie, C. Pear object detection in complex orchard environment based on improved YOLO11. Symmetry 2025, 17, 255. [Google Scholar] [CrossRef]
- Liao, Y.; Li, L.; Xiao, H.; Xu, F.; Shan, B.; Yin, H. YOLO-MECD: Citrus detection algorithm based on YOLOv11. Agronomy 2025, 15, 687. [Google Scholar] [CrossRef]
- Wang, A.; Xu, Y.; Hu, D.; Zhang, L.; Li, A.; Zhu, Q.; Liu, J. Tomato yield estimation using an improved lightweight YOLO11n network and an optimized region tracking-counting method. Agriculture 2025, 15, 1353. [Google Scholar] [CrossRef]
- Li, P.; Chen, J.; Chen, Q.; Huang, L.; Jiang, Z.; Hua, W.; Li, Y. Detection and picking point localization of grape bunches and stems based on oriented bounding box. Comput. Electron. Agric. 2025, 233, 110168. [Google Scholar] [CrossRef]
- Du, X.; Zhang, X.; Li, T.; Chen, X.; Yu, X.; Wang, H. YOLO-WAS: A lightweight apple target detection method based on improved YOLO11. Agriculture 2025, 15, 1521. [Google Scholar] [CrossRef]
- Nan, Y.; Zhang, H.; Zeng, Y.; Zheng, J.; Ge, Y. Intelligent detection of multi-class pitaya fruits in target picking row based on WGB-YOLO network. Comput. Electron. Agric. 2023, 208, 107780. [Google Scholar] [CrossRef]
- Bai, Y.; Yu, J.; Yang, S.; Ning, J. An improved YOLO algorithm for detecting flowers and fruits on strawberry seedlings. Biosyst. Eng. 2024, 237, 1–12. [Google Scholar] [CrossRef]
- Zhao, Y.; Chen, Y.; Xu, X.; He, Y.; Gan, H.; Wu, N.; Wang, Z.; Sun, X.; Wang, Y.; Skobelev, P.; et al. Ta-YOLO: Overcoming target blocked challenges in greenhouse tomato detection and counting. Front. Plant Sci. 2025, 16, 1618214. [Google Scholar] [CrossRef]
- Aldubaikhi, A.; Patel, S. Advancements in small-object detection (2023–2025): Approaches, datasets, benchmarks, applications, and practical guidance. Appl. Sci. 2025, 15, 11882. [Google Scholar] [CrossRef]
- Wang, H.; Wang, H.; Liu, Q. TW-YOLO: High-precision steel wire rope detection algorithm based on triplet attention. Adv. Comput. Commun. 2025, 6, 48–54. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, Y.; Gong, C.; Chen, Y.; Yu, H. Applications of deep learning for dense scenes analysis in agriculture: A review. Sensors 2020, 20, 1520. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. YOLOv10: Real-time end-to-end object detection. In Advances in Neural Information, Processing Systems, Proceedings of the Neural In-formation Processing Systems Conference, Vancouver, BC, Canada, 16 December 2024; Neural Information Processing Systems Foundation: San Diego, CA, USA, 2024. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Yang, K.; Song, Z. Deep learning-based object detection improvement for fine-grained birds. IEEE Access 2021, 9, 67901–67915. [Google Scholar] [CrossRef]
- Fu, R.; Hu, Q.; Dong, X.; Guo, Y.; Gao, Y.; Li, B. Axiom-based Grad-CAM: Towards accurate visualization and explanation of CNNs. arXiv 2020, arXiv:2008.02312. [Google Scholar]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Memory efficient vision transformer with cascaded group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B. MobileNetV4: Universal models for the mobile ecosystem. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]









| Parameters | Setup |
|---|---|
| Epochs | 200 |
| Batch size | 8 |
| Learning rate | 1 × 10−4 |
| Image size | 640 × 640 |
| Weight Decay | 0.0005 |
| Momentum | 0.937 |
| Optimizer | AdamW |
| AFPN-Pro2345 | AIFI | C3k2-RFEMAConv | C2PSA-MSCADYT | Triplet Attention | P/% | R/% | F1/% | mAP50/% | mAP50–95/% | Parameters/M | GFLOPs/G |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 90.1 | 78.9 | 84.1 | 86.9 | 54.7 | 2.58 | 6.3 | |||||
| √ | 91.9 | 80.8 | 86.0 | 89.3 | 57.5 | 2.56 | 17.8 | ||||
| √ | 90.7 | 79.2 | 84.6 | 87.3 | 54.8 | 3.21 | 6.6 | ||||
| √ | 91.7 | 78.2 | 84.4 | 87.1 | 54.8 | 2.59 | 10.4 | ||||
| √ | 90.7 | 79.0 | 84.4 | 87.0 | 54.9 | 2.61 | 6.3 | ||||
| √ | 90.9 | 78.9 | 84.5 | 87.2 | 54.9 | 2.58 | 6.3 | ||||
| √ | √ | 91.4 | 81.3 | 86.1 | 89.4 | 57.6 | 3.18 | 18.1 | |||
| √ | √ | √ | 91.4 | 81.9 | 86.4 | 89.9 | 58.2 | 3.25 | 21.8 | ||
| √ | √ | √ | √ | 91.9 | 82.2 | 86.8 | 90.2 | 58.3 | 3.28 | 21.8 | |
| √ | √ | √ | √ | √ | 92.1 | 82.7 | 87.0 | 90.3 | 58.3 | 3.28 | 21.9 |
| Models | P/% | R/% | F1/% | mAP50/% | mAP50–95/% | Parameters/M | GFLOPs/G |
|---|---|---|---|---|---|---|---|
| Triplet Attention | 92.1 | 82.7 | 87.0 | 90.3 | 58.3 | 3.28 | 21.9 |
| SEAM | 91.0 | 82.0 | 86.3 | 89.9 | 58.1 | 3.43 | 22 |
| CBAM | 91.4 | 81.6 | 86.2 | 89.8 | 58.1 | 3.35 | 21.8 |
| AFGCAttention | 91.5 | 81.9 | 86.4 | 90.0 | 58.1 | 3.35 | 21.8 |
| BAMblock | 91.7 | 81.8 | 86.5 | 89.9 | 58.3 | 3.30 | 21.9 |
| LSKBlock | 91.6 | 82.1 | 86.6 | 90.2 | 58.3 | 3.53 | 22 |
| Models | P/% | R/% | F1/% | mAP50/% | mAP50–95/% | Parameters/M | GFLOPs/G |
|---|---|---|---|---|---|---|---|
| YOLOv5n | 91.0 | 77.0 | 83.4 | 86.5 | 54.4 | 2.18 | 5.8 |
| YOLOv8n | 90.7 | 79.5 | 84.7 | 86.8 | 55 | 2.58 | 6.8 |
| YOLOv10n | 89.7 | 79.5 | 84.3 | 87.9 | 55.5 | 2.27 | 6.5 |
| YOLO11n | 90.1 | 78.9 | 84.1 | 86.9 | 54.7 | 2.58 | 6.3 |
| RT-DETR | 91.4 | 82.9 | 86.9 | 89.5 | 57.3 | 19.87 | 56.9 |
| Wampee-YOLO | 92.1 | 82.7 | 87.0 | 90.3 | 58.3 | 3.28 | 21.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Z.; Xie, Y.; Wang, J.; Huang, G.; Yu, L.; Zhang, K.; Li, J.; Liu, C. Wampee-YOLO: A High-Precision Detection Model for Dense Clustered Wampee in Natural Orchard Scenario. Horticulturae 2026, 12, 232. https://doi.org/10.3390/horticulturae12020232
Li Z, Xie Y, Wang J, Huang G, Yu L, Zhang K, Li J, Liu C. Wampee-YOLO: A High-Precision Detection Model for Dense Clustered Wampee in Natural Orchard Scenario. Horticulturae. 2026; 12(2):232. https://doi.org/10.3390/horticulturae12020232
Chicago/Turabian StyleLi, Zhiwei, Yusha Xie, Jingjie Wang, Guogang Huang, Longzhen Yu, Kai Zhang, Junlong Li, and Changyu Liu. 2026. "Wampee-YOLO: A High-Precision Detection Model for Dense Clustered Wampee in Natural Orchard Scenario" Horticulturae 12, no. 2: 232. https://doi.org/10.3390/horticulturae12020232
APA StyleLi, Z., Xie, Y., Wang, J., Huang, G., Yu, L., Zhang, K., Li, J., & Liu, C. (2026). Wampee-YOLO: A High-Precision Detection Model for Dense Clustered Wampee in Natural Orchard Scenario. Horticulturae, 12(2), 232. https://doi.org/10.3390/horticulturae12020232

