On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings
Abstract
1. Introduction
1.1. Arable Crop Plant Counting by Object Detection
- Crop Establishment Assessment: Early-season seedling counts determine whether or not replanting is necessary, directly affecting yield potential and economic returns [1].
- Precision Agriculture: Individual plant locations enable variable-rate application of inputs (fertilizers, pesticides) and selective harvesting, reducing costs and environmental impact [2].
- Plant Breeding Programs: Automated counting accelerates phenotyping workflows, enabling evaluation of larger populations and more traits [3].
- Insurance and Compliance: Standardized counting methods support crop insurance assessments and regulatory compliance for agricultural subsidies [4].
- Research Applications: Consistent counting methods enable meta-analyses across studies and reproducible research in agronomy [5].
1.2. Evolution of Object Detection Methods
1.3. Data-Efficient Detection Methods
1.4. Dataset Requirements for Agricultural Object Detection
1.5. Study Aim
- How training data source (in-domain vs. out-of-distribution) affects required dataset size
- Minimum dataset size needed to achieve benchmark performance (R2 = 0.85) for different model architectures
- Minimum annotation quality required to maintain benchmark performance
- Whether or not newer few-shot and zero-shot approaches can meet agricultural performance standards with reduced annotation requirements
- The potential role of handcrafted methods in modern deep learning pipelines
2. Materials and Methods
2.1. Datasets
2.2. Handcrafted Object Detector
2.3. Deep Learning Object Detectors
- YOLOv5 with model descriptions uses a CSP (Cross Stage Partial) backbone with a PANet neck for feature aggregation, achieving efficient detection through grid-based prediction with multiple anchors per cell. It was selected as the baseline CNN architecture due to its dominance in agricultural applications [33] and extensive use in crop monitoring studies [45,46]. Its CSP backbone with PANet neck provides a well-established reference point for minimum dataset requirements in agricultural contexts.
- YOLOv8 with model descriptions improves upon YOLOv5 by adopting a more efficient C2f block in its backbone, implementing an anchor-free detection head and using a task-specific decoupled head design for better accuracy–speed trade-offs. It represents the next-generation YOLO evolution with anchor-free detection and improved C2f blocks. We included it to evaluate whether or not architectural improvements could reduce dataset requirements compared to YOLOv5, given its reported superior accuracy–efficiency trade-offs [60].
- YOLO11 with model descriptions further refines the architecture with multi-scale deformable attention, improving small object detection—a crucial feature for seedling identification in varied field conditions. It was chosen as the most recent YOLO variant including an attention mechanisms (transformer). This selection allows us to assess whether or not state-of-the-art Transformer-mixed improvements affect minimum dataset requirements for small object detection.
- RT-DETR with model descriptions represents a hybrid approach combining CNN backbones with Transformer decoders, utilizing deformable attention for adaptive feature sampling and parallel prediction heads for real-time performance. Unlike pure CNN-based YOLO variants, RT-DETR’s Transformer components enable the modeling of global relationships between objects. We selected it over pure Transformer models (like DETR) due to its real-time capabilities and proven performance on agricultural datasets [36]. Its inclusion allows direct comparison between CNN-only and hybrid approaches for minimum dataset requirements.
- batch size: 16 (increased from default 8 to maximize GPU utilization while maintaining stable gradients)
- maximum training epochs: 200 (extended from default 100 to ensure convergence with small datasets)
- maximum training epochs without improvement: 15 (increased from default 10 for early stopping to allow longer plateau exploration)
- precision: mixed (to balance training speed and numerical accuracy)
- CD-ViTO with model descriptions uses cross-domain prototype matching, where a small set of annotated examples (shots) serves as class prototypes. It leverages Vision Transformer (ViT) backbones to extract feature representations and computes similarity between query images and prototypes for object localization. This architecture is specifically designed for scenarios with limited training data.
- OWLv2 with model descriptions represents a fundamentally different paradigm that requires no labeled examples of the target class. It leverages large-scale pre-training on image–text pairs to establish connections between visual features and natural language descriptions. During inference, it detects objects based solely on text prompts, eliminating the need for class-specific training data entirely.
- Base models: Trained using self-supervised learning with the OWL-ST method, generating pseudo-box annotations from web-scale image–text datasets.
- Fine-tuned models: Further trained on human-annotated object detection datasets.
- Ensemble models: Combining multiple weight-trained versions to balance open-vocabulary generalization and task-specific performance.
2.4. Minimum Dataset Size and Quality Modelling
3. Results
3.1. Handcrafted Object Detector
3.2. Many-Shot Object Detectors
3.2.1. OOD Training
3.2.2. ID Training
3.3. Few-Shot Object Detectors
3.4. Zero-Shot Object Detectors
4. Discussion
4.1. Dataset Source Impact on Object Detection Performance
4.2. Many-Shot Object Detection: Architecture and Dataset Requirements
4.3. Dataset Quality Trade-Offs
4.4. Few-Shot and Zero-Shot Approaches: Current Limitations
4.5. Handcrafted Methods in the Deep Learning Era
4.6. Implications for Practical Applications
4.7. Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Coefficient of determination | |
Root Mean Squared Error | |
Mean Absolute Percentage Error | |
Mean Average Precision | |
HC | Handcrafted |
ID | In-Distribution |
OOD | Out-of-Distribution |
CNN | Convolutional Neural Network |
ViT | Vision Transformer |
YOLO | You Only Look Once |
RT-DETR | Real-Time Detection Transformer |
GoF | Goodness of Fit |
Appendix A
Appendix A.1
Algorithm A1 H1 |
|
Algorithm A2 H2 |
|
Appendix A.2
- “maize”
- “seedling”
- “plant”
- “aerial view of maize seedlings”
- “corn seedlings in rows”
- “young maize plants from above”
- “crop rows with corn seedlings”
- “maize seedlings with regular spacing”
- “top-down view of corn plants”
- “agricultural field with maize seedlings”
- “orthomosaic of corn plants in rows”
References
- Blandino, M.; Testa, G.; Quaglini, L.; Reyneri, A. Effetto Della Densità Colturale e Dell’Applicazione di Fungicidi Sulla Produzione e la Qualità del Mais da Granella e da Trinciato; ITA: Rome, Italy, 2016. [Google Scholar]
- Lu, D.; Ye, J.; Wang, Y.; Yu, Z. Plant Detection and Counting: Enhancing Precision Agriculture in UAV and General Scenes. IEEE Access 2023, 11, 116196–116205. [Google Scholar] [CrossRef]
- Saatkamp, A.; Cochrane, A.; Commander, L.; Guja, L.; Jimenez-Alfaro, B.; Larson, J.; Nicotra, A.; Poschlod, P.; Silveira, F.A.O.; Cross, A.; et al. A research agenda for seed-trait functional ecology. New Phytol. 2019, 221, 1764–1775. [Google Scholar] [CrossRef] [PubMed]
- De Petris, S.; Sarvia, F.; Gullino, M.; Tarantino, E.; Borgogno-Mondino, E. Sentinel-1 Polarimetry to Map Apple Orchard Damage after a Storm. Remote Sens. 2021, 13, 1030. [Google Scholar] [CrossRef]
- PP 1/333 (1) Adoption of Digital Technology for Data Generation for the Efficacy Evaluation of Plant Protection Products. EPPO Bull. 2025, 55, 14–19. [CrossRef]
- Zou, H.; Lu, H.; Li, Y.; Liu, L.; Cao, Z. Maize Tassels Detection: A Benchmark of the State of the Art. Plant Methods 2020, 16, 108. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2015, arXiv:1405.0312. [Google Scholar]
- Kraus, K. Photogrammetry: Geometry from Images and Laser Scans; De Gruyter: Berlin, Germany, 2011. [Google Scholar] [CrossRef]
- Pugh, N.A.; Thorp, K.R.; Gonzalez, E.M.; Elshikha, D.E.M.; Pauli, D. Comparison of image georeferencing strategies for agricultural applications of small unoccupied aircraft systems. Plant Phenome J. 2021, 4, e20026. [Google Scholar] [CrossRef]
- Dhonju, H.K.; Walsh, K.B.; Bhattarai, T. Web Mapping for Farm Management Information Systems: A Review and Australian Orchard Case Study. Agronomy 2023, 13, 2563. [Google Scholar] [CrossRef]
- Habib, A.; Han, Y.; Xiong, W.; He, F.; Zhang, Z.; Crawford, M. Automated Ortho-Rectification of UAV-Based Hyperspectral Data over an Agricultural Field Using Frame RGB Imagery. Remote Sens. 2016, 8, 796. [Google Scholar] [CrossRef]
- De Petris, S.; Sarvia, F.; Borgogno-Mondino, E. RPAS-based photogrammetry to support tree stability assessment: Longing for precision arboriculture. Urban For. Urban Green. 2020, 55, 126862. [Google Scholar] [CrossRef]
- Zhang, S.; Barrett, H.A.; Baros, S.V.; Neville, P.R.H.; Talasila, S.; Sinclair, L.L. Georeferencing Accuracy Assessment of Historical Aerial Photos Using a Custom-Built Online Georeferencing Tool. ISPRS Int. J. Geo-Inf. 2022, 11, 582. [Google Scholar] [CrossRef]
- Farjon, G.; Huijun, L.; Edan, Y. Deep-learning-based counting methods, datasets, and applications in agriculture: A review. Precis. Agric. 2023, 24, 1683–1711. [Google Scholar] [CrossRef]
- Meier, U.; Bleiholder, H.; Buhr, L.; Feller, C.; Hack, H.; Heß, M.; Lancashire, P.D.; Schnock, U.; Stauß, R.; van den Boom, T.; et al. The BBCH System to Coding the Phenological Growth Stages of Plants-History and Publications. J. Für Kult. 2009, 61, 41–52. [Google Scholar] [CrossRef]
- David, E.; Daubige, G.; Joudelat, F.; Burger, P.; Comar, A.; de Solan, B.; Baret, F. Plant Detection and Counting from High-Resolution RGB Images Acquired from UAVs: Comparison between Deep-Learning and Handcrafted Methods with Application to Maize, Sugar Beet, and Sunflower. bioRxiv 2021. [Google Scholar] [CrossRef]
- Liu, W.; Zhou, J.; Wang, B.; Costa, M.; Kaeppler, S.M.; Zhang, Z. IntegrateNet: A Deep Learning Network for Maize Stand Counting From UAV Imagery by Integrating Density and Local Count Maps. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6512605. [Google Scholar] [CrossRef]
- Maize_seeding Dataset > Overview. Available online: https://universe.roboflow.com/objectdetection-hytat/maize_seeding (accessed on 20 June 2025).
- Maize-Seedling-Detection Dataset > Overview. Available online: https://universe.roboflow.com/fyxdds-icloud-com/maize-seedling-detection (accessed on 20 June 2025).
- FAO. Agricultural Production Statistics 2010–2023; Volume Analytical Briefs; FAOSTAT: Rome, Italy, 2024. [Google Scholar]
- Torres-Sánchez, J.; Mesas-Carrascosa, F.J.; Jiménez-Brenes, F.M.; de Castro, A.I.; López-Granados, F. Early Detection of Broad-Leaved and Grass Weeds in Wide Row Crops Using Artificial Neural Networks and UAV Imagery. Agronomy 2021, 11, 749. [Google Scholar] [CrossRef]
- Zhang, Z.; Cao, R.; Peng, C.; Liu, R.; Sun, Y.; Zhang, M.; Li, H. Cut-edge detection method for rice harvesting based on machine vision. Agronomy 2020, 10, 590. [Google Scholar] [CrossRef]
- García-Martínez, H.; Flores-Magdaleno, H.; Khalil-Gardezi, A.; Ascencio-Hernández, R.; Tijerina-Chávez, L.; Vázquez-Peña, M.A.; Mancilla-Villa, O.R. Digital Count of Corn Plants Using Images Taken by Unmanned Aerial Vehicles and Cross Correlation of Templates. Agronomy 2020, 10, 469. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks|IEEE Journals & Magazine|IEEE Xplore. Available online: https://ieeexplore-ieee-org.bibliopass.unito.it/document/7485869 (accessed on 20 June 2025).
- You Only Look Once: Unified, Real-Time Object Detection|IEEE Conference Publication|IEEE Xplore. Available online: https://ieeexplore-ieee-org.bibliopass.unito.it/document/7780460 (accessed on 20 June 2025).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; NIPS’17. pp. 6000–6010. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2014, 115, 211–252. [Google Scholar] [CrossRef]
- Zong, Z.; Song, G.; Liu, Y. DETRs with Collaborative Hybrid Assignments Training. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6725–6735. [Google Scholar] [CrossRef]
- Khan, A.; Rauf, Z.; Sohail, A.; Khan, A.R.; Asif, H.; Asif, A.; Farooq, U. A Survey of the Vision Transformers and Their CNN-transformer Based Variants. Artif. Intell. Rev. 2023, 56, 2917–2970. [Google Scholar] [CrossRef]
- Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural Object Detection with You Only Look Once (YOLO) Algorithm: A Bibliometric and Systematic Literature Review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
- Rekavandi, A.M.; Rashidi, S.; Boussaid, F.; Hoefs, S.; Akbas, E.; Bennamoun, M. Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art. arXiv 2023, arXiv:2309.04902. [Google Scholar]
- Li, Y.; Miao, N.; Ma, L.; Shuang, F.; Huang, X. Transformer for Object Detection: Review and Benchmark. Eng. Appl. Artif. Intell. 2023, 126, 107021. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
- Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
- Bansal, A.; Sikka, K.; Sharma, G.; Chellappa, R.; Divakaran, A. Zero-Shot Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 384–400. [Google Scholar]
- Kang, B.; Liu, Z.; Wang, X.; Yu, F.; Feng, J.; Darrell, T. Few-Shot Object Detection via Feature Reweighting. arXiv 2019, arXiv:1812.01866. [Google Scholar]
- Minderer, M.; Gritsenko, A.; Houlsby, N. Scaling Open-Vocabulary Object Detection. Adv. Neural Inf. Process. Syst. 2023, 36, 72983–73007. [Google Scholar]
- Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Jiang, Q.; Li, C.; Yang, J.; Su, H.; et al. Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; pp. 38–55. [Google Scholar] [CrossRef]
- Karami, A.; Crawford, M.; Delp, E.J. Automatic Plant Counting and Location Based on a Few-Shot Learning Technique. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5872–5886. [Google Scholar] [CrossRef]
- Wang, D.; Parthasarathy, R.; Pan, X. Advancing Image Recognition: Towards Lightweight Few-shot Learning Model for Maize Seedling Detection. In Proceedings of the 2024 International Conference on Smart City and Information System, Kuala Lumpur, Malaysia, 17–19 May 2024; pp. 635–639. [Google Scholar] [CrossRef]
- Barreto, A.; Lottes, P.; Ispizua Yamati, F.R.; Baumgarten, S.; Wolf, N.A.; Stachniss, C.; Mahlein, A.K.; Paulus, S. Automatic UAV-based Counting of Seedlings in Sugar-Beet Field and Extension to Maize and Strawberry. Comput. Electron. Agric. 2021, 191, 106493. [Google Scholar] [CrossRef]
- Kitano, B.T.; Mendes, C.C.T.; Geus, A.R.; Oliveira, H.C.; Souza, J.R. Corn Plant Counting Using Deep Learning and UAV Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1–5. [Google Scholar] [CrossRef]
- Andvaag, E.; Krys, K.; Shirtliffe, S.J.; Stavness, I. Counting Canola: Toward Generalizable Aerial Plant Detection Models. Plant Phenomics 2024, 6, 0268. [Google Scholar] [CrossRef]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. arXiv 2017, arXiv:1707.02968. [Google Scholar]
- Alhazmi, K.; Alsumari, W.; Seppo, I.; Podkuiko, L.; Simon, M. Effects of Annotation Quality on Model Performance. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; pp. 063–067. [Google Scholar] [CrossRef]
- Hestness, J.; Narang, S.; Ardalani, N.; Diamos, G.; Jun, H.; Kianinejad, H.; Patwary, M.M.A.; Yang, Y.; Zhou, Y. Deep Learning Scaling Is Predictable, Empirically. arXiv 2017, arXiv:1712.00409. [Google Scholar]
- Mahmood, R.; Lucas, J.; Acuna, D.; Li, D.; Philion, J.; Alvarez, J.M.; Yu, Z.; Fidler, S.; Law, M.T. How Much More Data Do I Need? Estimating Requirements for Downstream Tasks. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 275–284. [Google Scholar] [CrossRef]
- Nguyen, N.D.; Do, T.; Ngo, T.D.; Le, D.D.; Valenti, C.F. An Evaluation of Deep Learning Methods for Small Object Detection. JECE 2020, 2020, 8856387. [Google Scholar] [CrossRef]
- Du, X.; Lin, T.Y.; Jin, P.; Ghiasi, G.; Tan, M.; Cui, Y.; Le, Q.V.; Song, X. SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11592–11601. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Liu, S.; Yin, D.; Feng, H.; Li, Z.; Xu, X.; Shi, L.; Jin, X. Estimating Maize Seedling Number with UAV RGB Images and Advanced Image Processing Methods. Precis. Agric. 2022, 23, 1604–1632. [Google Scholar] [CrossRef]
- Velumani, K.; Lopez-Lozano, R.; Madec, S.; Guo, W.; Gillet, J.; Comar, A.; Baret, F. Estimates of Maize Plant Density from UAV RGB Images Using Faster-RCNN Detection Model: Impact of the Spatial Resolution. Plant Phenomics 2021, 2021, 9824843. [Google Scholar] [CrossRef]
- Bumbaca, S. The Original Dataset for the Paper “On the minimum dataset requirements for fine-tuning an object detector for arable crop plant counting: A case study on maize seedlings”. Zenodo 2025. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Nice, France, 2012; Volume 25. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 726–740. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J.; Chaurasia, A. GitHub Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 16 April 2025).
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv 2024, arXiv:2304.07193. [Google Scholar]
- Fu, Y.; Wang, Y.; Pan, Y.; Huai, L.; Qiu, X.; Shangguan, Z.; Liu, T.; Fu, Y.; Gool, L.V.; Jiang, X. Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector. arXiv 2024, arXiv:2402.03094. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
- Zhu, H.; Qin, S.; Su, M.; Lin, C.; Li, A.; Gao, J. Harnessing Large Vision and Language Models in Agriculture: A Review. arXiv 2024, arXiv:2407.19679. [Google Scholar]
- Zhou, Y.; Yan, H.; Ding, K.; Cai, T.; Zhang, Y. Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models. Sensors 2024, 24, 6109. [Google Scholar] [CrossRef]
- Chen, H.; Huang, W.; Ni, Y.; Yun, S.; Liu, Y.; Wen, F.; Velasquez, A.; Latapie, H.; Imani, M. TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection. arXiv 2024, arXiv:2403.08108. [Google Scholar]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Draper, N.R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1998; Volume 326. [Google Scholar]
- Armstrong, J.; Collopy, F. Error measures for generalizing about forecasting methods: Empirical comparisons. Int. J. Forecast. 1992, 8, 69–80. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Vianna, L.S.; Gonçalves, A.L.; Souza, J.A. Analysis of learning curves in predictive modeling using exponential curve fitting with an asymptotic approach. PLoS ONE 2024, 19, e0299811. [Google Scholar] [CrossRef]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 966–970. [Google Scholar] [CrossRef]
- Xu, G.; Hao, Z.; Luo, Y.; Hu, H.; An, J.; Mao, S. DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices. arXiv 2023, arXiv:2309.05015. [Google Scholar] [CrossRef]
- Wu, K.; Otoo, E.; Shoshani, A. Optimizing Connected Component Labeling Algorithms. In Proceedings of the Medical Imaging 2005: Image Processing, San Diego, CA, USA, 12–17 February 2005. [Google Scholar]
Dataset | Phenological Stage | Train Size | Test Size |
---|---|---|---|
OOD Scientific | |||
DavidEtAl.2021 [16] | V3 | 182 tiles | N/A* |
LiuEtAl.2022 [55] | V3 | 596 tiles | N/A* |
OOD Internet | |||
OOD_int_1 [18] | V3 | 216 tiles | N/A* |
OOD_int_2 [19] | V5 | 174 tiles | N/A* |
ID [57] | |||
ID_1 | V3 | 150 tiles | 20 tiles |
ID_2 | V3 | 150 tiles | 20 tiles |
ID_3 | V5 | 150 tiles | 20 tiles |
Architecture | Shots | n * or S | s * or S | m * or B | l * or L | x * |
---|---|---|---|---|---|---|
YOLOv5 | many | 1.9 | 7.2 | 21.2 | 46.5 | 86.7 |
YOLOv8 | many | 3.2 | 11.2 | 25.9 | 43.7 | 68.2 |
YOLO11 | many | 4.0 | 12.5 | 28.0 | 50.0 | 75.0 |
RT-DETR | many | - | - | - | 60.0 | 80.0 |
CD-VITO | few | - | 22.0 † | 86.0 ‡ | 307.0 § | - |
OWLv2 | zero | - | - | 86.0 ‡ | 307.0 § | - |
Dataset | Tiles | Dataset % | ||||
---|---|---|---|---|---|---|
ID_1 | 0.95 | 0.12 | 9% | 0.87 | 1184 | 7.8% |
ID_2 | 0.93 | 0.11 | 12% | 0.81 | 279 | 4.2% |
ID_3 | 0.87 | 0.18 | 16% | 0.73 | 158 | 1.8% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bumbaca, S.; Borgogno-Mondino, E. On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings. Remote Sens. 2025, 17, 2190. https://doi.org/10.3390/rs17132190
Bumbaca S, Borgogno-Mondino E. On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings. Remote Sensing. 2025; 17(13):2190. https://doi.org/10.3390/rs17132190
Chicago/Turabian StyleBumbaca, Samuele, and Enrico Borgogno-Mondino. 2025. "On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings" Remote Sensing 17, no. 13: 2190. https://doi.org/10.3390/rs17132190
APA StyleBumbaca, S., & Borgogno-Mondino, E. (2025). On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings. Remote Sensing, 17(13), 2190. https://doi.org/10.3390/rs17132190