Enhanced Image Annotation in Wild Blueberry (Vaccinium angustifolium Ait.) Fields Using Sequential Zero-Shot Detection and Segmentation Models
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Preparation
2.2. Model Selection
2.3. Optimization and Tuning
2.4. SAM Automatic Annotation Integration
2.5. Evaluation Procedure
2.6. Statistical Analysis
3. Results and Discussion
3.1. Qualitative Visualization and Prompt-Specific Observations
3.2. Failure Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| ANOVA | Analysis of Variance |
| CLIP | Contrastive Language–Image Pretraining |
| CPU | Central Processing Unit |
| GPU | Graphics Processing Unit |
| HSD | Honest Significant Difference |
| IoU | Intersection over Union |
| LLM | Large Language Model |
| mIoU | Mean Intersection over Union |
| SAM | Segment Anything Model |
| SAM2 | Segment Anything Model Version 2 |
| ViT | Vision Transformer |
| YOLO | You Only Look Once |
Appendix A
| Detection Model | Segmentation Model | F1-Score | Precision | Recall |
|---|---|---|---|---|
| YOLO-World-s | Tiny | 0.198 ± 0.111 | 0.1144 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-m | Tiny | 0.2165 ± 0.1191 | 0.1276 ± 0.0799 | 0.9838 ± 0.049 |
| YOLO-World-l | Tiny | 0.198 ± 0.111 | 0.1144 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-x | Tiny | 0.198 ± 0.111 | 0.1144 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-s | Small | 0.198 ± 0.111 | 0.1144 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-m | Small | 0.2151 ± 0.1175 | 0.1257 ± 0.0777 | 0.9971 ± 0.007 |
| YOLO-World-l | Small | 0.198 ± 0.111 | 0.1144 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-x | Small | 0.198 ± 0.111 | 0.1144 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-s | Base | 0.194 ± 0.1156 | 0.1123 ± 0.076 | 0.9527 ± 0.1465 |
| YOLO-World-m | Base | 0.194 ± 0.1156 | 0.1123 ± 0.076 | 0.9527 ± 0.1465 |
| YOLO-World-l | Base | 0.1981 ± 0.111 | 0.1145 ± 0.0738 | 0.9993 ± 0.001 |
| YOLO-World-x | Base | 0.1937 ± 0.1158 | 0.1122 ± 0.076 | 0.9507 ± 0.1494 |
| YOLO-World-s | Large | 0.1927 ± 0.1164 | 0.1117 ± 0.0764 | 0.9424 ± 0.1513 |
| YOLO-World-m | Large | 0.1928 ± 0.1163 | 0.1117 ± 0.0764 | 0.9431 ± 0.1512 |
| YOLO-World-l | Large | 0.1982 ± 0.1103 | 0.1103 ± 0.0735 | 0.9961 ± 0.01 |
| YOLO-World-x | Large | 0.192 ± 0.1168 | 0.1113 ± 0.0766 | 0.9369 ± 0.1538 |
| SwinB | Tiny | 0.7295 ± 0.1764 | 0.6543 ± 0.2311 | 0.9177 ± 0.1281 |
| SwinB | Small | 0.7414 ± 0.1836 | 0.6637 ± 0.2283 | 0.9312 ± 0.1317 |
| SwinB | Base | 0.7576 ± 0.1802 | 0.6954 ± 0.2282 | 0.917 ± 0.1302 |
| SwinB | Large | 0.7617 ± 0.1744 | 0.711 ± 0.2263 | 0.9037 ± 0.1368 |
| SwinT | Tiny | 0.767 ± 0.1579 | 0.7423 ± 0.1902 | 0.8345 ± 0.1611 |
| SwinT | Small | 0.7716 ± 0.1676 | 0.7401 ± 0.2022 | 0.8599 ± 0.1536 |
| SwinT | Base | 0.79 ± 0.1666 | 0.7881 ± 0.1938 | 0.8449 ± 0.1739 |
| SwinT | Large | 0.8173 ± 0.102 | 0.8206 ± 0.1009 | 0.8333 ± 0.1528 |
| Detection Model | Segmentation Model | F1-Score | Precision | Recall |
|---|---|---|---|---|
| YOLO-World-s | Tiny | 0.5331 ± 0.3804 | 0.482 ± 0.4132 | 0.9759 ± 0.0312 |
| YOLO-World-m | Tiny | 0.331 ± 0.2797 | 0.3277 ± 0.334 | 0.7925 ± 0.315 |
| YOLO-World-l | Tiny | 0.3392 ± 0.3848 | 0.3218 ± 0.4172 | 0.6648 ± 0.4397 |
| YOLO-World-x | Tiny | 0.8994 ± 0.1127 | 0.9322 ± 0.1509 | 0.8944 ± 0.1162 |
| YOLO-World-s | Small | 0.486 ± 0.3875 | 0.4465 ± 0.4326 | 0.9772 ± 0.0506 |
| YOLO-World-m | Small | 0.3481 ± 0.3279 | 0.4145 ± 0.4082 | 0.8303 ± 0.3282 |
| YOLO-World-l | Small | 0.3346 ± 0.3795 | 0.3174 ± 0.4114 | 0.6606 ± 0.4382 |
| YOLO-World-x | Small | 0.9364 ± 0.0551 | 0.9636 ± 0.0488 | 0.9418 ± 0.082 |
| YOLO-World-s | Base | 0.5463 ± 0.3894 | 0.5227 ± 0.4359 | 0.9568 ± 0.0909 |
| YOLO-World-m | Base | 0.4139 ±0.3363 | 0.3496 ± 0.3585 | 0.9744 ± 0.0504 |
| YOLO-World-l | Base | 0.2738 ± 0.3446 | 0.3099 ± 0.4206 | 0.507 ± 0.459 |
| YOLO-World-x | Base | 0.8735 ± 0.1773 | 0.9175 ± 0.1686 | 0.8899 ± 0.1989 |
| YOLO-World-s | Large | 0.4807 ± 0.3952 | 0.5252 ± 0.4402 | 0.8948 ± 0.2709 |
| YOLO-World-m | Large | 0.4302 ± 0.351 | 0.3557 ± 0.3533 | 0.9927 ± 0.0135 |
| YOLO-World-l | Large | 0.325 ± 0.4009 | 0.3087 ± 0.4183 | 0.5739 ± 0.4709 |
| YOLO-World-x | Large | 0.9011 ± 0.1759 | 0.9016 ± 0.224 | 0.9489 ± 0.04 |
| SwinB | Tiny | 0.9281 ± 0.0805 | 0.9824 ± 0.014 | 0.8885 ± 0.129 |
| SwinB | Small | 0.9252 ± 0.0779 | 0.9724 ± 0.0461 | 0.8927 ± 0.1248 |
| SwinB | Base | 0.9368 ± 0.0769 | 0.9818 ± 0.0092 | 0.9045 ± 0.1264 |
| SwinB | Large | 0.9391 ± 0.0771 | 0.9779 ± 0.1253 | 0.909 ± 0.1177 |
| SwinT | Tiny | 0.9401 ± 0.0642 | 0.9795 ± 0.1037 | 0.9097 ± 0.1029 |
| SwinT | Small | 0.9331 ± 0.0648 | 0.9682 ± 0.0483 | 0.9074 ± 0.101 |
| SwinT | Base | 0.9377 ± 0.0759 | 0.9828 ± 0.008 | 0.9055 ± 0.1256 |
| SwinT | Large | 0.953 ± 0.0588 | 0.9775 ± 0.0265 | 0.9342 ± 0.0929 |
| Detection Model | Segmentation Model | F1-Score | Precision | Recall |
|---|---|---|---|---|
| YOLO-World-s | Tiny | 0.6064 ± 0.2147 | 0.6532 ± 0.1996 | 0.6289 ± 0.2835 |
| YOLO-World-m | Tiny | 0.5239 ± 0.1629 | 0.4935 ± 0.2561 | 0.7784 ± 0.234 |
| YOLO-World-l | Tiny | 0.6774 ± 0.1301 | 0.6021 ± 0.1803 | 0.8486 ± 0.1659 |
| YOLO-World-x | Tiny | 0.7441 ± 0.1271 | 0.7454 ± 0.1937 | 0.8187 ± 0.1808 |
| YOLO-World-s | Small | 0.6109 ± 0.2184 | 0.7181 ± 0.1699 | 0.6103 ± 0.2861 |
| YOLO-World-m | Small | 0.5206 ± 0.1043 | 0.5172 ± 0.2567 | 0.7896 ± 0.2671 |
| YOLO-World-l | Small | 0.6697 ± 0.1395 | 0.5919 ± 0.1944 | 0.8559 ± 0.1593 |
| YOLO-World-x | Small | 0.7316 ± 0.1401 | 0.738 ± 0.2119 | 0.8064 ± 0.1775 |
| YOLO-World-s | Base | 0.5711 ± 0.2565 | 0.6826 ± 0.2464 | 0.5207 ± 0.2795 |
| YOLO-World-m | Base | 0.5025 ± 0.1222 | 0.4849 ± 0.2432 | 0.7622 ± 0.2629 |
| YOLO-World-l | Base | 0.7255 ± 0.1206 | 0.7101 ± 0.1863 | 0.7933 ± 0.1541 |
| YOLO-World-x | Base | 0.7181 ± 0.1538 | 0.8042 ± 0.1875 | 0.714 ± 0.2103 |
| YOLO-World-s | Large | 0.6078 ± 0.214 | 0.7682 ± 0.2197 | 0.5255 ± 0.2368 |
| YOLO-World-m | Large | 0.5224 ± 0.1642 | 0.5008 ± 0.2638 | 0.7717 ± 0.2584 |
| YOLO-World-l | Large | 0.7597 ± 0.0954 | 0.7762 ± 0.1962 | 0.8208 ± 0.1615 |
| YOLO-World-x | Large | 0.7206 ± 0.1546 | 0.8182 ± 0.1954 | 0.7122 ± 0.2099 |
| SwinB | Tiny | 0.8529 ± 0.1126 | 0.8948 ± 0.0355 | 0.8426 ± 0.19 |
| SwinB | Small | 0.8542 ± 0.1128 | 0.8989 ± 0.039 | 0.8421 ± 0.1903 |
| SwinB | Base | 0.8564 ± 0.1156 | 0.8994 ± 0.0443 | 0.8443 ± 0.1898 |
| SwinB | Large | 0.8612 ± 0.1167 | 0.9165 ± 0.0441 | 0.8386 ± 0.1884 |
| SwinT | Tiny | 0.7622 ± 0.1654 | 0.917 ± 0.0583 | 0.6764 ± 0.2079 |
| SwinT | Small | 0.7645 ± 0.165 | 0.9235 ± 0.0619 | 0.677 ± 0.2078 |
| SwinT | Base | 0.7673 ± 0.1669 | 0.9253 ± 0.0596 | 0.6781 ± 0.207 |
| SwinT | Large | 0.7698 ± 0.1681 | 0.9405 ± 0.0624 | 0.6743 ± 0.206 |
| Detection Model | Segmentation Model | F1-Score | Precision | Recall |
|---|---|---|---|---|
| YOLO-World-s | Tiny | 0.7639 ± 0.1197 | 0.8295 ± 0.0612 | 0.7251 ± 0.1815 |
| YOLO-World-m | Tiny | 0.8276 ± 0.1113 | 0.839 ± 0.0805 | 0.8284 ± 0.155 |
| YOLO-World-l | Tiny | 0.8108 ± 0.1429 | 0.7779 ± 0.1585 | 0.8497 ± 0.1264 |
| YOLO-World-x | Tiny | 0.8029 ± 0.1274 | 0.7667 ± 0.1276 | 0.8591 ± 0.1585 |
| YOLO-World-s | Small | 0.7599 ± 0.1192 | 0.82 ± 0.0988 | 0.7421 ± 0.1957 |
| YOLO-World-m | Small | 0.818 ± 0.11 | 0.8428 ± 0.0895 | 0.8057 ± 0.1458 |
| YOLO-World-l | Small | 0.8106 ± 0.1439 | 0.7844 ± 0.1606 | 0.8417 ± 0.1287 |
| YOLO-World-x | Small | 0.7984 ± 0.1221 | 0.768 ± 0.1274 | 0.8491 ± 0.1543 |
| YOLO-World-s | Base | 0.7649 ± 0.1105 | 0.8444 ± 0.0569 | 0.7238 ± 0.1823 |
| YOLO-World-m | Base | 0.8174 ± 0.108 | 0.8337 ± 0.0794 | 0.8118 ± 0.1471 |
| YOLO-World-l | Base | 0.8096 ± 0.1424 | 0.7762 ± 0.1587 | 0.8494 ± 0.1262 |
| YOLO-World-x | Base | 0.7944 ± 0.1258 | 0.7577 ± 0.1304 | 0.852 ± 0.1558 |
| YOLO-World-s | Large | 0.783 ± 0.1255 | 0.8631 ± 0.0637 | 0.7401 ± 0.192 |
| YOLO-World-m | Large | 0.8169 ± 0.1128 | 0.8384 ± 0.0966 | 0.808 ± 0.1458 |
| YOLO-World-l | Large | 0.8121 ± 0.1402 | 0.7833 ± 0.1571 | 0.8462 ± 0.1245 |
| YOLO-World-x | Large | 0.8027 ± 0.1246 | 0.7728 ± 0.1259 | 0.8507 ± 0.1555 |
| SwinB | Tiny | 0.8274 ± 0.148 | 0.7973 ± 0.1915 | 0.885 ± 0.0996 |
| SwinB | Small | 0.8364 ± 0.1387 | 0.8147 ± 0.1834 | 0.8811 ± 0.0951 |
| SwinB | Base | 0.833 ± 0.1376 | 0.8041 ± 0.177 | 0.8836 ± 0.0984 |
| SwinB | Large | 0.8405 ± 0.1346 | 0.8183 ± 0.1767 | 0.8832 ± 0.0959 |
| SwinT | Tiny | 0.8305 ± 0.1242 | 0.8941 ± 0.0826 | 0.7865 ± 0.1699 |
| SwinT | Small | 0.8341 ± 0.1227 | 0.9084 ± 0.0778 | 0.7817 ± 0.1671 |
| SwinT | Base | 0.829 ± 0.1254 | 0.8887 ± 0.0848 | 0.7874 ± 0.1701 |
| SwinT | Large | 0.8358 ± 0.1237 | 0.9077 ± 0.0816 | 0.7849 ± 0.1667 |
References
- Bilodeau, M.F.; Esau, T.J.; MacEachern, C.B.; Farooque, A.A.; White, S.N.; Zaman, Q.U. Identifying Hair Fescue in Wild Blueberry Fields Using Drone Images for Precise Application of Granular Herbicide. Smart Agric. Technol. 2023, 3, 100127. [Google Scholar] [CrossRef]
- Chen, S.W.; Shivakumar, S.S.; Dcunha, S.; Das, J.; Okon, E.; Qu, C.; Taylor, C.J.; Kumar, V. Counting Apples and Oranges with Deep Learning: A Data-Driven Approach. IEEE Robot. Autom. Lett. 2017, 2, 781–788. [Google Scholar] [CrossRef]
- Fiona, J.R.; Anitha, J. Automated Detection of Plant Diseases and Crop Analysis in Agriculture Using Image Processing Techniques: A Survey. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019; pp. 1–5. [Google Scholar]
- Nuske, S.; Achar, S.; Bates, T.; Narasimhan, S.; Singh, S. Yield Estimation in Vineyards by Visual Grape Detection. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2352–2358. [Google Scholar]
- Cordier, A.; Gutierrez, P.; Plessis, V. Improving Generalization with Synthetic Training Data for Deep Learning Based Quality Inspection. arXiv 2022, arXiv:2202.12818. [Google Scholar] [CrossRef]
- Said, A.F.; Kashyap, V.; Choudhury, N.; Akhbari, F. A Cost-Effective, Fast, and Robust Annotation Tool. In Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 10–12 October 2017; pp. 1–6. [Google Scholar]
- MacEachern, C.B.; Esau, T.J.; Schumann, A.W.; Hennessy, P.J.; Zaman, Q.U. Detection of Fruit Maturity Stage and Yield Estimation in Wild Blueberry Using Deep Learning Convolutional Neural Networks. Smart Agric. Technol. 2023, 3, 100099. [Google Scholar] [CrossRef]
- Manfrini, L.; Pierpaoli, E.; Zibordi, M.; Morandi, B.; Muzzi, E.; Losciale, P.; Grappadelli, L.C. Monitoring Strategies for Precise Production of High Quality Fruit and Yield in Apple in Emilia-Romagna. Chem. Eng. Trans. 2015, 44, 301–306. [Google Scholar]
- MacEachern, C.B.; Esau, T.J.; Zaman, Q.U.; White, S.N.; Farooque, A.A. Development of a Novel Precision Applicator for Spot Treatment of Granular Agrochemical in Wild Blueberry. Sci. Rep. 2024, 14, 13751. [Google Scholar] [CrossRef]
- Zaman, Q.U.; Esau, T.J.; Schumann, A.W.; Percival, D.C.; Chang, Y.K.; Read, S.M.; Farooque, A.A. Development of Prototype Automated Variable Rate Sprayer for Real-Time Spot-Application of Agrochemicals in Wild Blueberry Fields. Comput. Electr. Agric. 2011, 76, 175–182. [Google Scholar] [CrossRef]
- Abbey, J.; Percival, D.; Jaakola, L.; Asiedu, S.K. Efficacy, Persistence and Residue Levels of Fungicides for Botrytis Control in Wild Blueberry. Crop Prot. 2024, 179, 106633. [Google Scholar] [CrossRef]
- White, S.N. Evaluation of Herbicides for Hair Fescue (Festuca filiformis) Management and Potential Seedbank Reduction in Lowbush Blueberry. Weed Technol. 2019, 33, 840–846. [Google Scholar] [CrossRef]
- Eaton, L.J.; Nams, V.O. Second Cropping of Wild Blueberries—Effects of Management Practices. Can. J. Plant Sci. 2006, 86, 1189–1195. [Google Scholar] [CrossRef]
- Burgess, P. Guide to Weed, Insect and Disease Management in Wild Blueberry; Perennia: Kentville, NS, Canada, 2018. [Google Scholar]
- White, S.N.; Boyd, N.S.; Van Acker, R.C. Growing Degree-Day Models for Predicting Lowbush Blueberry (Vaccinium angustifolium Ait.) Ramet Emergence, Tip Dieback, and Flowering in Nova Scotia, Canada. HortScience 2012, 47, 1014–1021. [Google Scholar] [CrossRef]
- Perennia Food and Agriculture Corporation. Nova Scotia Wild Blueberry Crop Development Schedule and Management Recommendations. 2024. Available online: https://www.perennia.ca/wp-content/uploads/2018/03/Nova-Scotia-Wild-Blueberry-Crop-Development-Schedule-and-Management-Recommendations-2024.pdf (accessed on 3 October 2025).
- Jewell, L.E.; Compton, K.; Wiseman, D. Evidence for a Genetically Distinct Population of Exobasidium sp. Causing Atypical Leaf Blight Symptoms on Lowbush Blueberry (Vaccinium angustifolium Ait.) in Newfoundland and Labrador, Canada. Can. J. Plant Pathol. 2021, 43, 897–904. [Google Scholar] [CrossRef]
- Percival, D.C.; Dawson, J.K. Foliar Disease Impact and Possible Control Strategies in Wild Blueberry Production. In Proceedings of the ISHS Acta Horticulturae 810: IX International Vaccinium Symposium, Corvallis, OR, USA, 13–16 July 2008; pp. 345–354. [Google Scholar]
- Fenu, G.; Malloci, F.M. DiaMOS Plant: A Dataset for Diagnosis and Monitoring Plant Disease. Agronomy 2021, 11, 2107. [Google Scholar] [CrossRef]
- Lucas, G.B.; Campbell, C.L.; Lucas, L.T. Diseases Caused by Viruses. In Introduction to Plant Diseases: Identification and Management; Springer: Boston, MA, USA, 1992; pp. 291–308. ISBN 978-0-412-06961-1. [Google Scholar]
- Hildebrand, P.D.; Nickerson, N.L.; McRae, K.B.; Lu, X. Incidence and Impact of Red Leaf Disease Caused by Exobasidium vaccinii in Lowbush Blueberry Fields in Nova Scotia. Can. J. Plant Pathol. 2000, 22, 364–367. [Google Scholar] [CrossRef]
- Lyu, H.; McLean, N.; McKenzie-Gopsill, A.; White, S.N. Weed Survey of Nova Scotia Lowbush Blueberry (Vaccinium angustifolium Ait.) Fields. Int. J. Fruit Sci. 2021, 21, 359–378. [Google Scholar] [CrossRef]
- MacEachern, C.B.; Esau, T.J.; White, S.N.; Zaman, Q.U.; Farooque, A.A. Evaluation of Dichlobenil for Hair Fescue (Festuca filiformis Pourr.) Management in Wild Blueberry (Vaccinium angustifolium Ait.). Agron. J. 2024, 116, 590–597. [Google Scholar] [CrossRef]
- Esau, T.; Zaman, Q.; Groulx, D.; Farooque, A.; Schumann, A.; Chang, Y. Machine Vision Smart Sprayer for Spot-Application of Agrochemical in Wild Blueberry Fields. Precis. Agric. 2018, 19, 770–788. [Google Scholar] [CrossRef]
- Li, Y.; Kong, D.; Zhang, Y.; Tan, Y.; Chen, L. Robust Deep Alignment Network with Remote Sensing Knowledge Graph for Zero-Shot and Generalized Zero-Shot Remote Sensing Image Scene Classification. ISPRS J. Photogramm. Remote Sens. 2021, 179, 145–158. [Google Scholar] [CrossRef]
- Mullins, C.C.; Esau, T.J.; Zaman, Q.U.; Toombs, C.L.; Hennessy, P.J. Leveraging Zero-Shot Detection Mechanisms to Accelerate Image Annotation for Machine Learning in Wild Blueberry (Vaccinium angustifolium Ait.). Agronomy 2024, 14, 2830. [Google Scholar] [CrossRef]
- Cheng, T.; Song, L.; Ge, Y.; Liu, W.; Wang, X.; Shan, Y. YOLO-World: Real-Time Open-Vocabulary Object Detection. arXiv 2024, arXiv:2401.17270. [Google Scholar]
- Rettenberger, L.; Schilling, M.; Reischl, M. Annotation Efforts in Image Segmentation Can Be Reduced by Neural Network Bootstrapping. Curr. Dir. Biomed. Eng. 2022, 8, 329–332. [Google Scholar] [CrossRef]
- Sánchez, J.S.; Lisani, J.-L.; Catalán, I.A.; Álvarez-Ellacuría, A. Leveraging Bounding Box Annotations for Fish Segmentation in Underwater Images. IEEE Access 2023, 11, 125984–125994. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
- Mazurowski, M.A.; Dong, H.; Gu, H.; Yang, J.; Konz, N.; Zhang, Y. Segment Anything Model for Medical Image Analysis: An Experimental Study. Med. Image Anal. 2023, 89, 102918. [Google Scholar] [CrossRef]
- Yuan, Z. Principles, Applications, and Advancements of the Segment Anything Model. Appl. Comput. Eng. 2024, 53, 73–78. [Google Scholar] [CrossRef]
- Ren, T.; Liu, S.; Zeng, A.; Lin, J.; Li, K.; Cao, H.; Chen, J.; Huang, X.; Chen, Y.; Yan, F.; et al. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. arXiv 2024, arXiv:2401.14159. [Google Scholar] [CrossRef]
- Brust, C.-A.; Käding, C.; Denzler, J. Active Learning for Deep Object Detection. arXiv 2018, arXiv:1809.09875. [Google Scholar] [CrossRef]
- Schmidt, S.; Rao, Q.; Tatsch, J.; Knoll, A. Advanced Active Learning Strategies for Object Detection. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 871–876. [Google Scholar]
- Gurav, R.; Patel, H.; Shang, Z.; Eldawy, A.; Chen, J.; Scudiero, E.; Papalexakis, E. Can SAM Recognize Crops? Quantifying the Zero-Shot Performance of a Semantic Segmentation Foundation Model on Generating Crop-Type Maps Using Satellite Imagery for Precision Agriculture. arXiv 2023, arXiv:2311.15138. [Google Scholar]
- Casas, G.G.; Ismail, Z.H.; Shapiai, M.I.; Karuppiah, E.K. Automated Detection and Segmentation of Baby Kale Crowns Using Grounding DINO and SAM for Data-Scarce Agricultural Applications. Smart Agric. Technol. 2025, 11, 100903. [Google Scholar] [CrossRef]
- Tian, Q.; Zhang, H.; Bian, L.; Zhou, L.; Shen, Z.; Ge, Y. Field-Based Phenotyping for Poplar Seedlings Biomass Evaluation Based on Zero-Shot Segmentation with Multimodal UAV Images. Comput. Electron. Agric. 2025, 236, 110462. [Google Scholar] [CrossRef]
- Yang, C.; Jin, Q.; Wang, Y.; Zhou, Y.; Lan, D.; Yang, Y. EHAPZero: Ensemble Hierarchical Attribute Prompting Based Zero-Shot Learning for Pest Recognition. IEEE Internet Things J. 2024, 12, 49107–49119. [Google Scholar] [CrossRef]
- Esmaeilpour, S.; Liu, B.; Robertson, E.; Shu, L. Zero-Shot out-of-Distribution Detection Based on the Pre-Trained Model Clip. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 6568–6576. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.-T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
- Rahman, M.A.; Wang, Y. Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation. In Advances in Visual Computing; Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., et al., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 10072, pp. 234–244. ISBN 978-3-319-50834-4. [Google Scholar]
- Montgomery, D.C. Design and Analysis of Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2017; ISBN 978-1-119-11347-8. [Google Scholar]








| Class | Images | Object Instances | Location | Equipment | Resolution |
|---|---|---|---|---|---|
| Developmental buds | 72 | 1266 | 45.493991°, −62.990759° 45.472361°, −63.631111° | iPhone 8 (Apple Inc., Cupertino, CA, USA) | 4032 × 3024 |
| Ripe blueberries | 66 | 2227 | 45.440839°, −63.542934° | Canon EOS RP (Canon Inc., Tokyo, Japan) | 6240 × 4160 |
| Red leaf disease | 66 | 130 | 45.500750°, −63.107650° | Fujifilm HS30EXR (Fujifilm Corporation, Tokyo, Japan) | 4608 × 3456 |
| Hair fescue | 20 | 436 | Multiple NS fields | DJI M300 RTK + P1 (SZ DJI Technology Co., Ltd., Shenzhen, China) | 8192 × 5460 |
| Model | Size (M Parameters) | Speed (FPS, A100) | SA-V Test (J&F) |
|---|---|---|---|
| SAM2 tiny | 38.9 | 91.2 | 76.5 |
| SAM2 small | 46 | 84.8 | 76.6 |
| SAM2 base | 80.8 | 64.1 | 78.2 |
| SAM2 large | 224.4 | 39.5 | 79.5 |
| Framework | Model | Dataset | Prompt | Confidence (%) | mIoU | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|---|
| Grounding DINO | Swin-T | Bud | Buds emerging | 41.00 | 56.2 | 90.91 | 44.44 |
| Swin-B | Developing bud | 34.00 | 69.1 | 78.57 | 73.33 | ||
| Swin-T | Berry | Individual blueberry | 36.00 | 71.6 | 89.58 | 48.86 | |
| Swin-B | Blueberry | 36.00 | 68.2 | 80.33 | 55.68 | ||
| Swin-T | Fescue | Patches of fescue | 24.00 | 80.4 | 84.21 | 59.26 | |
| Swin-B | Fescue patches | 15.00 | 75.4 | 76.00 | 70.37 | ||
| Swin-T | Red leaf | A red leaf disease | 40.00 | 95.3 | 100.00 | 71.43 | |
| Swin-B | A cluster of red leaves | 30.00 | 95.5 | 100.00 | 85.71 | ||
| YOLO-World | s | Bud | Bud with leaves | 0.12 | 39.4 | 42.11 | 17.78 |
| m | Bud with leaves | 0.70 | 15.7 | 16.67 | 4.44 | ||
| l | Bud and sprouting leaves | 0.31 | 31.9 | 17.58 | 35.56 | ||
| x | Bud with leaves | 0.28 | 44.0 | 55.26 | 46.67 | ||
| s | Berry | A blue fruit | 0.80 | 55.7 | 55.77 | 32.95 | |
| m | Spherical blueberry | 0.12 | 59.4 | 52.86 | 42.05 | ||
| l | A single, round blueberry | 4.00 | 66.1 | 56.00 | 47.73 | ||
| x | A blue fruit | 1.15 | 63.8 | 64.71 | 50.00 | ||
| s | Fescue | Grass spots | 0.04 | 29.3 | 6.67 | 3.70 | |
| m | Fescue grass regions | 0.01 | 22.3 | 0.00 | 0.00 | ||
| l | Fescue grass spots | 0.00 | 29.0 | 0.00 | 0.00 | ||
| x | Fescue grass regions | 0.23 | 24.7 | 0.00 | 0.00 | ||
| s | Red leaf | Bright red leaves | 3.70 | 50.0 | 44.44 | 57.14 | |
| m | A cluster of red leaves | 1.00 | 34.5 | 18.18 | 28.57 | ||
| l | A cluster of red leaves | 1.40 | 50.9 | 2.96 | 85.71 | ||
| x | A cluster of red leaves | 1.00 | 85.5 | 45.45 | 71.43 |
| Model | SAM2-Tiny | SAM2-Small | SAM2-Base | SAM2-Large |
|---|---|---|---|---|
| YOLO-World-s | 0.473 (±0.254) abc | 0.478 (±0.253) abc | 0.447 (±0.279) abc | 0.474 (±0.252) abc |
| YOLO-World-m | 0.375 (±0.181) abc | 0.360 (±0.096) bc | 0.346 (±0.118) c | 0.373 (±0.177) b |
| YOLO-World-l | 0.353 (±0.300) c | 0.348 (±0.300) c | 0.389 (±0.319) abc | 0.414 (±0.329) abc |
| YOLO-World-x | 0.610 (±0.178) abc | 0.597 (±0.192) abc | 0.583 (±0.202) abc | 0.586 (±0.205) abc |
| Swin–T | 0.642 (±0.215) abc | 0.645 (±0.215) abc | 0.641 (±0.212) abc | 0.644 (±0.214) abc |
| Swin–B | 0.743 (±0.152) ab | 0.748 (±0.155) a | 0.744 (±0.152) ab | 0.751 (±0.154) a |
| Model | SAM2-Tiny | SAM2-Small | SAM2-Base | SAM2-Large |
|---|---|---|---|---|
| YOLO-World-s | 0.631 (±0.164) | 0.627 (±0.168) | 0.631 (±0.154) | 0.660 (±0.182) |
| YOLO-World-m | 0.720 (±0.171) | 0.706 (±0.172) | 0.705 (±0.166) | 0.705 (±0.174) |
| YOLO-World-l | 0.704 (±0.204) | 0.704 (±0.207) | 0.702 (±0.204) | 0.705 (±0.203) |
| YOLO-World-x | 0.687 (±0.181) | 0.679 (±0.173) | 0.675 (±0.178) | 0.685 (±0.177) |
| Swin–T | 0.726 (±0.191) | 0.728 (±0.188) | 0.720 (±0.189) | 0.730 (±0.189) |
| Swin–B | 0.731 (±0.200) | 0.738 (±0.189) | 0.723 (±0.202) | 0.738 (±0.194) |
| Model | SAM2-Tiny | SAM2-Small | SAM2-Base | SAM2-Large |
|---|---|---|---|---|
| YOLO-World-s | 0.466 (±0.419) abcd | 0.426 (±0.432) abcd | 0.484 (±0.428) abcd | 0.425 (±0.440) abcd |
| YOLO-World-m | 0.245 (±0.296) d | 0.280 (±0.367) d | 0.331 (±0.351) cd | 0.351 (±0.367) bcd |
| YOLO-World-l | 0.290 (±0.392) d | 0.284 (±0.383) d | 0.223 (±0.342) d | 0.286 (±0.407) d |
| YOLO-World-x | 0.835 (±0.176) ab | 0.885 (±0.097) a | 0.812 (±0.245) abc | 0.854 (±0.224) a |
| Swin-T | 0.884 (±0.110) a | 0.878 (±0.112) a | 0.814 (±0.275) abc | 0.905 (±0.114) a |
| Swin-B | 0.866 (±0.135) a | 0.867 (±0.130) a | 0.843 (±0.212) ab | 0.887 (±0.131) a |
| Model | SAM2-Tiny | SAM2-Small | SAM2-Base | SAM2-Large |
|---|---|---|---|---|
| YOLO-World-s | 0.115 (±0.077) b | 0.115 (±0.077) b | 0.113 (±0.080) b | 0.112 (±0.080) b |
| YOLO-World-m | 0.127 (±0.083) b | 0.126 (±0.082) b | 0.113 (±0.080) b | 0.112 (±0.080) b |
| YOLO-World-l | 0.115 (±0.077) b | 0.115 (±0.077) b | 0.115 (±0.077) b | 0.115 (±0.077) b |
| YOLO-World-x | 0.115 (±0.077) b | 0.115 (±0.077) b | 0.112 (±0.080) b | 0.113 (±0.081) b |
| Swin-T | 0.651 (±0.217) a | 0.660 (±0.226) a | 0.681 (±0.231) a | 0.694 (±0.175) a |
| Swin-B | 0.606 (±0.227) a | 0.626 (±0.240) a | 0.647 (±0.231) a | 0.647 (±0.223) a |
| Model | SAM2-Tiny (s/Image) | SAM2-Small (s/Image) | SAM2-Base (s/Image) | SAM2-Large (s/Image) |
|---|---|---|---|---|
| YOLO-World-s | 0.59 (±0.10) efg | 0.58 (±0.10) efg | 0.65 (±0.11) cdefg | 0.84 (±0.10) bcdef |
| YOLO-World-m | 0.60 (±0.12) defg | 0.60 (±0.10) defg | 0.65 (±0.10) cdefg | 0.86 (±0.11) abcde |
| YOLO-World-l | 0.97 (±0.54) abcd | 1.00 (±0.60) abc | 1.03 (±0.57) ab | 1.23 (±0.54) a |
| YOLO-World-x | 0.64 (±0.11) cdefg | 0.64 (±0.10) cdefg | 0.71 (±0.11) bcdefg | 0.90 (±0.11) abcde |
| Swin-T | 0.30 (±0.10) g | 0.30 (±0.10) g | 0.33 (±0.10) g | 0.42 (±0.11) fg |
| Swin-B | 0.35 (±0.11) g | 0.36 (±0.10) g | 0.38 (±0.10) g | 0.48 (±0.13) efg |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mullins, C.C.; Esau, T.J.; Johnstone, R.; Toombs, C.L.; Hennessy, P.J. Enhanced Image Annotation in Wild Blueberry (Vaccinium angustifolium Ait.) Fields Using Sequential Zero-Shot Detection and Segmentation Models. Sensors 2025, 25, 7325. https://doi.org/10.3390/s25237325
Mullins CC, Esau TJ, Johnstone R, Toombs CL, Hennessy PJ. Enhanced Image Annotation in Wild Blueberry (Vaccinium angustifolium Ait.) Fields Using Sequential Zero-Shot Detection and Segmentation Models. Sensors. 2025; 25(23):7325. https://doi.org/10.3390/s25237325
Chicago/Turabian StyleMullins, Connor C., Travis J. Esau, Riley Johnstone, Chloe L. Toombs, and Patrick J. Hennessy. 2025. "Enhanced Image Annotation in Wild Blueberry (Vaccinium angustifolium Ait.) Fields Using Sequential Zero-Shot Detection and Segmentation Models" Sensors 25, no. 23: 7325. https://doi.org/10.3390/s25237325
APA StyleMullins, C. C., Esau, T. J., Johnstone, R., Toombs, C. L., & Hennessy, P. J. (2025). Enhanced Image Annotation in Wild Blueberry (Vaccinium angustifolium Ait.) Fields Using Sequential Zero-Shot Detection and Segmentation Models. Sensors, 25(23), 7325. https://doi.org/10.3390/s25237325

