Fast and Accurate Detection of Forty Types of Fruits and Vegetables: Dataset and Method
Abstract
:1. Introduction
- Introduction of the FV40 Dataset: We develope a fine-grained fruit and vegetable object detection dataset. The initial version contains over 14,000 images and 100,000 annotated bounding boxes, covering 40 distinct fruit and vegetable categories. The dataset is continuously being expanded and updated.
- Development of the FVRT-DETR Algorithm: We propose the first algorithm for fast and accurate detection of multiple fruit and vegetable types, called FVRT-DETR. This algorithm innovatively uses Mamba as the backbone and introduces a novel multi-scale deep feature fusion encoder (MDFF encoder) module, which effectively enhances the model’s performance in handling multi-scale features while reducing parameters, improving detection accuracy for fruits and vegetables of various sizes and variations.
- Improved Multi-Scale Feature Handling: To address the challenge of multi-scale fruit and vegetable detection, we propose an efficient feature fusion method within the MDFF encoder module that better integrates multi-scale feature maps, thus enhancing the detection capabilities across 40 different fruit and vegetable categories with varying sizes.
- Scalable and Highly Adaptable Model: Unlike previous models tailored to a single type of fruit or vegetable, FVRT-DETR is designed to be scalable and highly adaptable, capable of accurately detecting a wide variety of fruit and vegetable types. This allows it to handle the diversity of produce in real-world agricultural scenarios, making it a highly versatile solution in the industry.
2. Related Work
Fruit and Vegetable Detection Algorithm
3. Our Method
3.1. Mamba Backbone
3.2. MDFF Encoder
4. Experimental Details
4.1. Dataset
4.2. Experimental Setup
4.3. Evaluation Metrics
5. Results and Analysis
5.1. Benchmark Algorithm Performance Evaluation
5.2. Ablation Study
5.2.1. Mamba Backbone
5.2.2. MDFF Encoder
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Parico, A.I.B.; Ahamed, T. Real Time Pear Fruit Detection and Counting Using YOLOv4 Models and Deep SORT. Sensors 2021, 21, 4803. [Google Scholar] [CrossRef] [PubMed]
- Gao, F.; Fang, W.; Sun, X.; Wu, Z.; Zhao, G.; Li, G.; Li, R.; Fu, L.; Zhang, Q. A Novel Apple Fruit Detection and Counting Methodology Based on Deep Learning and Trunk Tracking in Modern Orchard. Comput. Electron. Agric. 2022, 197, 107000. [Google Scholar]
- Tang, Y.; Qiu, J.; Zhang, Y.; Wu, D.; Cao, Y.; Zhao, K.; Zhu, L. Optimization Strategies of Fruit Detection to Overcome the Challenge of Unstructured Background in Field Orchard Environment: A Review. Precis. Agric. 2023, 24, 1183–1219. [Google Scholar]
- Zhang, W.; Wang, J.; Liu, Y.; Chen, K.; Li, H.; Duan, Y.; Wu, W.; Shi, Y.; Guo, W. Deep-Learning-Based In-Field Citrus Fruit Detection and Tracking. Hortic. Res. 2022, 9, uhac003. [Google Scholar]
- Mirhaji, H.; Soleymani, M.; Asakereh, A.; Mehdizadeh, S.A. Fruit Detection and Load Estimation of an Orange Orchard Using the YOLO Models Through Simple Approaches in Different Imaging and Illumination Conditions. Comput. Electron. Agric. 2021, 191, 106533. [Google Scholar]
- Tang, Y.; Zhou, H.; Wang, H.; Zhang, Y. Fruit Detection and Positioning Technology for a Camellia Oleifera C. Abel Orchard Based on Improved YOLOv4-Tiny Model and Binocular Stereo Vision. Expert Syst. Appl. 2023, 211, 118573. [Google Scholar]
- Mao, D.; Sun, H.; Li, X.; Yu, X.; Wu, J.; Zhang, Q. Real-Time Fruit Detection Using Deep Neural Networks on CPU (RTFD): An Edge AI Application. Comput. Electron. Agric. 2023, 204, 107517. [Google Scholar]
- Seth, K. Fruits and Vegetables Image Recognition Dataset. 2021. Available online: https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition (accessed on 1 January 2025).
- Muresan, H.; Oltean, M. Fruit recognition from images using deep learning. Acta Univ. Inform. 2023, 10, 26–42. [Google Scholar]
- Latif, G.; Mohammad, N.; Alghazo, J. DeepFruit: A dataset of fruit images for fruit classification and calories calculation. Data Brief 2023, 50, 109524. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Wang, Z.; Li, C.; Xu, H.; Zhu, X. Mamba YOLO: SSMs-Based YOLO for Object Detection. arXiv 2024, arXiv:2406.05835. [Google Scholar]
- Lu, S.; Chen, W.; Zhang, X.; Karkee, M. Canopy-Attention-YOLOv4-Based Immature/Mature Apple Fruit Detection on Dense-Foliage Tree Architectures for Early Crop Load Estimation. Comput. Electron. Agric. 2022, 193, 106696. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, W.; Yu, J.; He, L.; Chen, J.; He, Y. Complete and Accurate Holly Fruits Counting Using YOLOX Object Detection. Comput. Electron. Agric. 2022, 198, 107062. [Google Scholar] [CrossRef]
- Wang, Y.; Yan, G.; Meng, Q.; Yao, T.; Han, J.; Zhang, B. DSE-YOLO: Detail Semantics Enhancement YOLO for Multi-Stage Strawberry Detection. Comput. Electron. Agric. 2022, 198, 107057. [Google Scholar] [CrossRef]
- Bhargava, A.; Bansal, A.; Goyal, V. Machine Learning—Based Detection and Sorting of Multiple Vegetables and Fruits. Food Anal. Methods 2022, 15, 228–242. [Google Scholar] [CrossRef]
- Gupta, S.; Tripathi, A.K. Fruit and Vegetable Disease Detection and Classification: Recent Trends, Challenges, and Future Opportunities. Eng. Appl. Artif. Intell. 2024, 133, 108260. [Google Scholar] [CrossRef]
- López-García, F.; Andreu-García, G.; Blasco, J.; Aleixos, N.; Valiente, J.-M. Automatic Detection of Skin Defects in Citrus Fruits Using a Multivariate Image Analysis Approach. Comput. Electron. Agric. 2010, 71, 189–197. [Google Scholar] [CrossRef]
- Bulanon, D.M.; Kataoka, T. Fruit Detection System and an End Effector for Robotic Harvesting of Fuji Apples. Agric. Eng. Int. CIGR J. 2010, 12, 203–210. [Google Scholar]
- Sengupta, S.; Lee, W.S. Identification and Determination of the Number of Immature Green Citrus Fruit in a Canopy Under Different Ambient Light Conditions. Biosyst. Eng. 2014, 117, 51–61. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G. YOLOv5 Release v7.0. GitHub Repository. 2022. Available online: https://github.com/ultralytics/yolov5/tree/v7.0 (accessed on 15 December 2024).
- Li, C.; Li, L.; Geng, Y.; Jiang, H.; Cheng, M.; Zhang, B.; Ke, Z.; Xu, X.; Chu, X. YOLOv6 v3.0: A Full-Scale Reloading. arXiv 2023, arXiv:2301.05586. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Jocher, G.; YOLOv8. GitHub Repository. 2023. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 15 December 2024).
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. Adv. Neural Inf. Process. Syst. 2024, 36, 51094–51112. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–21. [Google Scholar]
- Wang, D.; He, D. Channel Pruned YOLO v5s-Based Deep Learning Approach for Rapid and Accurate Apple Fruitlet Detection Before Fruit Thinning. Biosyst. Eng. 2021, 210, 271–281. [Google Scholar] [CrossRef]
- Chen, W.; Liu, M.; Zhao, C.; Li, X.; Wang, Y. MTD-YOLO: Multi-Task Deep Convolutional Neural Network for Cherry Tomato Fruit Bunch Maturity Detection. Comput. Electron. Agric. 2024, 216, 108533. [Google Scholar]
- Wang, J.; Liu, M.; Du, Y.; Zhao, M.; Jia, H.; Guo, Z.; Su, Y.; Lu, D.; Liu, Y. PG-YOLO: An Efficient Detection Algorithm for Pomegranate Before Fruit Thinning. Eng. Appl. Artif. Intell. 2024, 134, 108700. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.; Shum, H.-Y. DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
- Sun, P.; Jiang, Y.; Xie, E.; Shao, W.; Yuan, Z.; Wang, C.; Luo, P. What Makes for End-to-End Object Detection? In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 9934–9944. [Google Scholar]
- Wang, J.; Song, L.; Li, Z.; Sun, H.; Sun, J.; Zheng, N. End-to-End Object Detection with Fully Convolutional Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15849–15858. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Gu, Z.; Ma, X.; Guan, H.; Jiang, Q.; Deng, H.; Wen, B.; Zhu, T.; Wu, X. Tomato Fruit Detection and Phenotype Calculation Method Based on the Improved RTDETR Model. Comput. Electron. Agric. 2024, 227, 109524. [Google Scholar] [CrossRef]
- Huang, Z.; Zhang, X.; Wang, H.; Wei, H.; Zhang, Y.; Zhou, G. Pear Fruit Detection Model in Natural Environment Based on Lightweight Transformer Architecture. Agriculture 2024, 15, 24. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. VMamba: Visual State Space Model. arXiv 2024, arXiv:2404.12345. [Google Scholar]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-Aware Reassembly of Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Gupta, A.; Dollar, P.; Girshick, R. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 5356–5364. Available online: https://www.kaggle.com/datasets/henningheyen/lvis-fruits-and-vegetables-dataset (accessed on 29 October 2024).
- James, J.A.; Manching, H.K.; Mattia, M.R.; Bowman, K.D.; Hulse-Kemp, A.M.; Beksi, W.J. CitDet: A Benchmark Dataset for Citrus Fruit Detection. IEEE Robot. Autom. Lett. 2024, 9, 10788–10795. [Google Scholar] [CrossRef]
- Xu, S.; Wang, X.; Lv, W.; Chang, Q.; Cui, C.; Deng, K.; Wang, G.; Dang, Q.; Wei, S.; Du, Y.; et al. PP-YOLOE: An Evolved Version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar]
- Jocher, G. YOLOv11. GitHub Repository. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 27 November 2024).
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, Y.; Zhang, X.; Yang, T.; Sun, J. Anchor DETR: Query Design for Transformer-Based Detector. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 2567–2575. [Google Scholar]
- Meng, D.; Chen, X.; Fan, Z.; Zeng, G.; Li, H.; Yuan, Y.; Sun, L.; Wang, J. Conditional DETR for Fast Training Convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3651–3660. [Google Scholar]
- Yao, Z.; Ai, J.; Li, B.; Zhang, C. Efficient DETR: Improving End-to-End Object Detector with Dense Prior. arXiv 2021, arXiv:2104.01318. [Google Scholar]
- Gao, P.; Zheng, M.; Wang, X.; Dai, J.; Li, H. Fast Convergence of DETR with Spatially Modulated Co-Attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3621–3630. [Google Scholar]
- Liu, S.; Li, F.; Zhang, H.; Yang, X.; Qi, X.; Su, H.; Zhu, J.; Zhang, L. DAB-DETR: Dynamic Anchor Boxes Are Better Queries for DETR. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
- Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. DN-DETR: Accelerate DETR Training by Introducing Query Denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 13619–13627. [Google Scholar]
- Chen, Q.; Su, X.; Zhang, X.; Wang, J.; Chen, J.; Shen, Y.; Han, C.; Chen, Z.; Xu, W.; Li, F.; et al. LW-DETR: A transformer replacement to yolo for real-time detection. arXiv 2024, arXiv:2406.03459. [Google Scholar]
- Peng, Y.; Li, H.; Wu, P.; Zhang, Y.; Sun, X.; Wu, F. D-FINE: Redefine regression Task in DETRs as Fine-grained distribution refinement. arXiv 2024, arXiv:2410.13842. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Lv, W.; Zhao, Y.; Chang, Q.; Huang, K.; Wang, G.; Liu, Y. RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer. arXiv 2024, arXiv:2407.17140. [Google Scholar]
- Howard, A.G. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4: Universal Models for the Mobile Ecosystem. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2025; pp. 78–96. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. Ghostnetv2: Enhance cheap operation with long-range attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. Efficientvit: Memory efficient vision transformer with cascaded group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14420–14430. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Cai, Y.; Zhou, Y.; Han, Q.; Sun, J.; Kong, X.; Li, J.; Zhang, X. Reversible column networks. arXiv 2022, arXiv:2212.11696. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
- Shi, D. Transnext: Robust foveal visual perception for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 17773–17783. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, J.; Tao, D. Vanillanet: The power of minimalism in deep learning. Adv. Neural Inf. Process. Syst. 2024, 36, 7050–7064. [Google Scholar]
- Wang, A.; Chen, H.; Lin, Z.; Han, J.; Ding, G. Repvit: Revisiting mobile cnn from vit perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 15909–15920. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher flops for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
- Ding, X.; Zhang, Y.; Ge, Y.; Zhao, S.; Song, L.; Yue, X.; Shan, Y. Unireplknet: A universal perception large-kernel convnet for audio, video, point cloud, time-series, and image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 5513–5524. [Google Scholar]
- Li, Y.; Hu, J.; Wen, Y.; Evangelidis, G.; Salahi, K.; Wang, Y.; Tulyakov, S.; Ren, J. Rethinking vision transformers for mobilenet size and speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 16889–16900. [Google Scholar]
- Zhang, J.; Li, X.; Li, J.; Liu, L.; Xue, Z.; Zhang, B.; Jiang, Z.; Huang, T.; Wang, Y.; Wang, C. Rethinking mobile block for efficient attention-based models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; IEEE Computer Society: Washington, DC, USA, 2023; pp. 1389–1400. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Chen, Y.; Zhang, C.; Chen, B.; Huang, Y.; Sun, Y.; Wang, C.; Fu, X.; Dai, Y.; Qin, F.; Peng, Y.; et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases. Comput. Biol. Med. 2024, 170, 107917. [Google Scholar] [CrossRef] [PubMed]
Hyperparameter | Value |
---|---|
Optimizer | AdamW |
Base Learning Rate (All) | 0.0001 |
Weight Decay | 0.0001 |
Batch Size | 8 |
Epochs | 200 |
Warmup Epochs | 2000 |
Learning Rate Scheduler | Cosine decay |
Warmup Momentum | 0.8 |
IoU Threshold | 0.7 |
Box Loss Weight | 7.5 |
Classification Loss Weight | 0.5 |
Distribution Focal Loss | 1.5 |
Max Detected Objects | 300 |
Class Loss Weight | 1.0 |
Nominal Batch Size | 64 |
Image Size | 640 |
Data Augmentation | Mosaic, Mixup |
Mask Ratio | 4 |
Efficiency | Average Accuracy | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Backbone | Params | FLOPs | FPS | Precision | Recall | |||||||
Real-time Object Detectors | |||||||||||||
YOLOv5-L [22] | - | 46 M | 109 G | 54 | 76.7 | 73.6 | 63.9 | 81.1 | 68.2 | 62.4 | 53.7 | 65.4 | 71.7 |
YOLOv5-X [22] | - | 86 M | 205 G | 43 | 78.9 | 75.1 | 65.8 | 85.2 | 70.0 | 63.1 | 56.7 | 66.9 | 74.4 |
PPYOLOE-L [44] | - | 52 M | 110 G | 94 | 79.5 | 75.3 | 66.4 | 85.4 | 70.1 | 63.5 | 57.4 | 67.6 | 75.0 |
PPYOLOE-X [44] | - | 98 M | 206 G | 60 | 80.1 | 75.4 | 67.9 | 86.6 | 71.2 | 64.2 | 58.3 | 68.7 | 76.2 |
YOLOv6-L [23] | - | 59 M | 150 G | 99 | 80.4 | 75.9 | 68.3 | 87.0 | 71.5 | 64.8 | 58.9 | 69.7 | 77.1 |
YOLOv7-L [24] | - | 36 M | 104 G | 55 | 79.7 | 75.1 | 67.0 | 85.7 | 70.6 | 64.1 | 58.0 | 68.3 | 75.3 |
YOLOv7-X [24] | - | 71 M | 189 G | 45 | 81.2 | 77.3 | 67.6 | 87.0 | 71.9 | 65.2 | 59.0 | 69.6 | 76.2 |
YOLOv8-L [25] | - | 43 M | 165 G | 71 | 81.2 | 77.6 | 68.2 | 87.3 | 72.0 | 65.6 | 59.5 | 70.1 | 76.6 |
YOLOv8-X [25] | - | 68 M | 257 G | 50 | 81.7 | 78.7 | 69.5 | 88.0 | 73.1 | 66.8 | 60.3 | 70.8 | 77.3 |
YOLOv9-C [27] | - | 25 M | 102 G | 143 | 82.3 | 79.0 | 69.8 | 88.4 | 73.7 | 67.2 | 60.9 | 71.3 | 77.7 |
YOLOv9-E [27] | - | 57 M | 189 G | 60 | 82.8 | 79.6 | 70.4 | 89.1 | 73.7 | 67.0 | 61.4 | 72.0 | 78.0 |
Mamba YOLO-T [12] | Mamba-T | 5.8 M | 13.2 G | 161 | 81.4 | 77.4 | 68.3 | 87.5 | 71.7 | 65.9 | 59.6 | 70.3 | 77.0 |
Mamba YOLO-B [12] | Mamba-B | 19.1 M | 45.4 G | 161 | 82.5 | 77.9 | 69.0 | 88.0 | 72.3 | 66.2 | 60.0 | 71.8 | 78.3 |
Mamba YOLO-L [12] | Mamba-L | 57.6 M | 156.2 G | 161 | 83.6 | 78.9 | 70.3 | 88.8 | 73.5 | 67.3 | 61.2 | 72.0 | 79.1 |
YOLOv11-L [45] | - | 25 M | 87 G | 161 | 83.4 | 78.5 | 70.0 | 87.8 | 72.4 | 66.9 | 60.9 | 71.6 | 78.8 |
YOLOv11-X [45] | - | 57 M | 195 G | 89 | 84.3 | 79.0 | 71.0 | 89.3 | 73.9 | 67.7 | 61.7 | 72.8 | 79.7 |
YOLOv12-L [46] | - | 26 M | 89 G | 150 | 83.4 | 78.5 | 70.2 | 87.4 | 72.5 | 66.9 | 60.3 | 71.0 | 79.2 |
YOLOv12-X [46] | - | 59 M | 199 G | 80 | 83.7 | 79.1 | 71.1 | 89.0 | 73.7 | 67.3 | 61.9 | 72.5 | 79.6 |
End-to-end Object Detectors | |||||||||||||
DETR-DC5 [31] | R50 [47] | 41 M | 187 G | - | 70.9 | 67.8 | 57.8 | 76.3 | 62.9 | 58.4 | 50.3 | 60.9 | 66.9 |
DETR-DC5 [31] | R101 [47] | 60 M | 253 G | - | 72.2 | 69.0 | 58.4 | 76.5 | 63.5 | 59.6 | 52.0 | 61.4 | 67.7 |
Anchor-DETR-DC5 [48] | R50 [47] | 39 M | 172 G | - | 71.7 | 68.6 | 58.0 | 75.8 | 62.9 | 58.6 | 51.7 | 60.6 | 67.0 |
Anchor-DETR-DC5 [48] | R101 [47] | - | - | - | 72.5 | 69.8 | 59.1 | 76.4 | 63.7 | 59.4 | 52.9 | 61.7 | 67.8 |
Conditional-DETR-DC5 [49] | R50 [47] | 44 M | 195 G | - | 72.8 | 69.6 | 59.3 | 76.3 | 63.8 | 59.6 | 53.4 | 62.5 | 68.9 |
Conditional-DETR-DC5 [49] | R101 [47] | 63 M | 262 G | - | 73.7 | 70.4 | 60.0 | 77.1 | 64.3 | 60.4 | 53.6 | 63.6 | 69.4 |
Efficient-DETR [50] | R50 [47] | 35 M | 210 G | - | 72.7 | 69.5 | 59.7 | 76.8 | 64.0 | 60.5 | 53.6 | 62.9 | 69.2 |
Efficient-DETR [50] | R101 [47] | 54 M | 289 G | - | 73.4 | 70.1 | 59.9 | 77.2 | 64.5 | 61.3 | 54.0 | 63.2 | 69.5 |
SMCA-DETR [51] | R50 [47] | 40 M | 152 G | - | 73.2 | 70.0 | 59.5 | 77.0 | 64.0 | 60.7 | 54.5 | 63.0 | 70.0 |
SMCA-DETR [51] | R101 [47] | 58 M | 218 G | - | 73.5 | 71.0 | 60.1 | 77.8 | 64.6 | 61.3 | 55.0 | 63.6 | 70.4 |
Deformable-DETR [32] | R50 [47] | 40 M | 173 G | - | 74.0 | 71.3 | 60.6 | 78.3 | 64.8 | 61.4 | 55.4 | 64.0 | 70.5 |
DAB-Deformable-DETR [52] | R50 [47] | 48 M | 195 G | - | 74.9 | 72.5 | 61.4 | 78.9 | 65.3 | 63.0 | 56.0 | 64.6 | 71.6 |
DAB-Deformable-DETR++ [52] | R50 [47] | 47 M | - | - | 75.7 | 73.0 | 63.3 | 79.8 | 67.4 | 63.5 | 57.7 | 66.2 | 73.3 |
DN-Deformable-DETR [53] | R50 [47] | 48 M | 195 G | - | 75.9 | 73.4 | 63.5 | 80.3 | 67.7 | 63.8 | 58.3 | 66.8 | 73.5 |
DN-Deformable-DETR++ [53] | R50 [47] | 47 M | - | - | 77.3 | 74.6 | 64.7 | 81.8 | 68.9 | 64.4 | 60.0 | 67.9 | 75.2 |
DINO-Deformable-DETR [33] | R50 [47] | 47 M | 279 G | 5 | 78.8 | 76.3 | 66.2 | 83.5 | 70.2 | 66.0 | 61.7 | 69.7 | 77.3 |
LW-DETR-L [54] | - | 47 M | 72 G | 110 | 81.0 | 78.2 | 69.9 | 87.2 | 72.6 | 67.7 | 61.3 | 72.0 | 78.3 |
LW-DETR-X [54] | - | 118 M | 174 G | 50 | 82.2 | 79.3 | 71.3 | 89.0 | 73.4 | 68.3 | 62.0 | 72.7 | 79.9 |
D-FINE-L [55] | - | 31 M | 91 G | 127 | 81.6 | 78.4 | 70.4 | 88.1 | 72.7 | 67.7 | 60.4 | 72.4 | 79.1 |
D-FINE-X [55] | - | 62 M | 202 G | 80 | 81.7 | 80.2 | 71.2 | 89.1 | 74.0 | 68.4 | 61.9 | 71.7 | 80.1 |
Real-time End-to-end Object Detectors | |||||||||||||
YOLOv10-B [56] | - | 20.5 M | 98.7 G | 164 | 79.4 | 76.3 | 67.5 | 85.8 | 70.1 | 66.7 | 57.5 | 68.2 | 74.7 |
RT-DETR [36] | R34 [47] | 31.4 M | 90.6 G | 173 | 80.2 | 76.6 | 68.0 | 86.0 | 70.0 | 66.3 | 57.3 | 68.7 | 75.2 |
RT-DETRv2 [57] | R34 [47] | 36 M | 100 G | 145 | 80.9 | 77.3 | 68.4 | 86.9 | 70.5 | 67.0 | 57.6 | 68.9 | 75.1 |
FVRT-DETR-T | Mamba-T | 17.0 M | 91.5 G | 170 | 82.5 | 77.3 | 68.9 | 87.2 | 70.9 | 68.2 | 58.4 | 70.2 | 77.9 |
YOLOv10-L [56] | - | 25.8 M | 127.2 G | 137 | 81.6 | 77.5 | 68.2 | 87.9 | 72.0 | 68.9 | 59.3 | 70.5 | 77.2 |
RT-DETR [36] | R50 [47] | 42.8 M | 134.4 G | 108 | 81.9 | 78.2 | 68.2 | 87.8 | 72.2 | 69.0 | 59.0 | 71.3 | 77.6 |
RT-DETRv2 [57] | R50 [47] | 42 M | 136 G | 108 | 82.4 | 78.5 | 68.3 | 87.9 | 72.4 | 69.6 | 59.0 | 71.6 | 77.9 |
FVRT-DETR-B | Mamba-B | 27.1 M | 145.8 G | 95 | 83.6 | 78.3 | 70.0 | 88.3 | 72.1 | 70.0 | 59.2 | 71.1 | 78.8 |
YOLOv10-X [56] | - | 31.7 M | 171 G | 93 | 82.4 | 78.3 | 68.5 | 88.5 | 72.6 | 69.4 | 60.7 | 71.9 | 78.6 |
RT-DETR [36] | R101 [47] | 76.5 M | 257.3 G | 74 | 83.0 | 79.1 | 69.0 | 89.6 | 73.9 | 70.3 | 61.5 | 72.7 | 79.2 |
RT-DETRv2 [57] | R101 [47] | 76 M | 259 G | 74 | 83.1 | 79.0 | 68.7 | 89.8 | 74.0 | 70.7 | 61.5 | 72.5 | 79.3 |
FVRT-DETR-L | Mamba-L | 44.6 M | 270.1 G | 63 | 84.6 | 80.4 | 71.6 | 90.4 | 74.6 | 71.6 | 62.5 | 74.0 | 80.5 |
Efficiency | Average Accuracy | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Backbone | Params | FLOPs | FPS | Precision | Recall | |||||||
Real-time Object Detectors | |||||||||||||
YOLOv11-L [45] | - | 25 M | 87 G | 161 | 80.5 | 74.9 | 67.2 | 87.3 | 70.9 | 65.8 | 60.7 | 71.4 | 77.6 |
YOLOv11-X [45] | - | 57 M | 195 G | 89 | 81.6 | 76.1 | 68.1 | 87.8 | 71.2 | 66.6 | 61.4 | 72.3 | 78.6 |
YOLOv12-L [46] | - | 26 M | 89 G | 150 | 80.3 | 74.6 | 67.7 | 87.6 | 71.3 | 65.9 | 60.6 | 71.2 | 78.0 |
YOLOv12-X [46] | - | 59 M | 199 G | 80 | 82.3 | 76.3 | 68.4 | 88.7 | 73.0 | 66.0 | 61.6 | 72.0 | 79.3 |
End-to-end Object Detectors | |||||||||||||
LW-DETR-L [54] | - | 47 M | 72 G | 110 | 78.6 | 76.0 | 67.7 | 84.9 | 70.3 | 65.2 | 58.8 | 69.5 | 75.8 |
LW-DETR-X [54] | - | 118 M | 174 G | 50 | 79.9 | 76.8 | 68.8 | 86.6 | 70.9 | 65.9 | 59.5 | 70.4 | 77.5 |
D-FINE-L [55] | - | 31 M | 91 G | 127 | 79.2 | 76.0 | 67.9 | 85.6 | 70.3 | 65.2 | 57.9 | 69.9 | 76.6 |
D-FINE-X [55] | - | 62 M | 202 G | 80 | 79.3 | 77.7 | 68.8 | 86.8 | 71.5 | 66.0 | 59.4 | 69.3 | 77.7 |
Real-time End-to-end Object Detectors | |||||||||||||
RT-DETR [36] | R34 [47] | 31.4 M | 90.6 G | 173 | 78.2 | 74.1 | 65.6 | 83.7 | 67.9 | 64.2 | 55.3 | 66.3 | 72.9 |
RT-DETRv2 [57] | R34 [47] | 36 M | 100 G | 145 | 78.5 | 75.3 | 65.9 | 84.5 | 68.4 | 64.9 | 55.5 | 66.7 | 72.8 |
FVRT-DETR-T | Mamba-T | 17.0 M | 91.5 G | 170 | 80.3 | 75.2 | 66.6 | 85.1 | 68.8 | 66.2 | 56.2 | 67.8 | 75.8 |
RT-DETR [36] | R50 [47] | 42.8 M | 134.4 G | 108 | 79.6 | 75.9 | 66.2 | 85.5 | 70.1 | 67.3 | 56.5 | 68.8 | 75.2 |
RT-DETRv2 [57] | R50 [47] | 42 M | 136 G | 108 | 80.2 | 76.5 | 66.7 | 85.7 | 70.3 | 67.4 | 57.6 | 69.1 | 75.8 |
FVRT-DETR-B | Mamba-B | 27.1 M | 145.8 G | 95 | 81.3 | 77.1 | 68.7 | 87.1 | 71.3 | 68.5 | 57.8 | 69.6 | 78.4 |
RT-DETR [36] | R101 [47] | 76.5 M | 257.3 G | 74 | 80.7 | 76.6 | 67.1 | 87.5 | 71.9 | 68.1 | 59.3 | 70.6 | 76.8 |
RT-DETRv2 [57] | R101 [47] | 76 M | 259 G | 74 | 80.9 | 76.9 | 66.4 | 87.7 | 71.6 | 68.7 | 59.4 | 70.1 | 77.2 |
FVRT-DETR-L | Mamba-L | 44.6 M | 270.1 G | 63 | 83.6 | 79.0 | 70.2 | 89.4 | 73.2 | 70.6 | 62.3 | 72.9 | 79.3 |
Efficiency | Average Accuracy | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Backbone | Params | FLOPs | FPS | Precision | Recall | ||||||
RT-DETR [36] (Baseline) | R18 [47] | 20.0 M | 60.0 G | 211 | 79.2 | 76.1 | 67.8 | 85.9 | 69.8 | 56.7 | 67.8 | 74.9 |
RT-DETR [36] | R34 [47] | 31.4 M | 90.6 G | 173 | 80.2 | 76.6 | 68.0 | 86.0 | 70.0 | 57.3 | 68.7 | 75.2 |
RT-DETR [36] | R50 [47] | 42.8 M | 134.4 G | 108 | 81.9 | 78.2 | 68.2 | 87.8 | 72.2 | 59.0 | 71.3 | 77.6 |
RT-DETR [36] | R101 [47] | 76.5 M | 257.3 G | 74 | 83.0 | 79.1 | 69.0 | 89.6 | 73.9 | 61.5 | 72.7 | 79.2 |
RT-DETR [36] | MobileNetV1 [58] | 12.3 M | 34.6 G | 256 | 76.3 | 72.9 | 65.3 | 82.6 | 68.3 | 56.9 | 67.2 | 73.9 |
RT-DETR [36] | MobileNetV2 [59] | 10.6 M | 28.9 G | 274 | 76.5 | 72.6 | 65.9 | 83.0 | 68.0 | 57.5 | 67.3 | 74.3 |
RT-DETR [36] | MobileNetV3 [60] | 11.9 M | 27.8 G | 302 | 76.6 | 72.5 | 65.5 | 83.3 | 67.5 | 57.9 | 67.8 | 74.5 |
RT-DETR [36] | MobileNetV4 [61] | 11.5 M | 40.6 G | 220 | 76.3 | 73.1 | 66.0 | 83.4 | 67.3 | 58.2 | 67.7 | 74.5 |
RT-DETR [36] | ShuffleNetV1 [62] | 16.8 M | 39.1 G | 210 | 75.9 | 72.0 | 66.3 | 83.9 | 68.2 | 58.0 | 66.5 | 73.7 |
RT-DETR [36] | ShuffleNetV2 [63] | 9.8 M | 26.6 G | 239 | 74.7 | 71.2 | 65.1 | 82.5 | 67.6 | 56.9 | 65.4 | 73.3 |
RT-DETR [36] | GhostnetV1 [64] | 11.8 M | 26.8 G | 225 | 74.9 | 71.0 | 65.8 | 82.7 | 67.9 | 57.3 | 65.8 | 73.0 |
RT-DETR [36] | GhostnetV2 [65] | 12.7 M | 27.3 G | 217 | 75.4 | 71.5 | 66.3 | 82.4 | 68.2 | 57.7 | 66.0 | 73.5 |
RT-DETR [36] | EfficientNetV1 [66] | 14.3 M | 24.5 G | 195 | 75.8 | 72.0 | 67.2 | 83.0 | 69.5 | 57.0 | 65.7 | 73.8 |
RT-DETR [36] | EfficientViT [67] | 11.1 M | 28.5 G | 174 | 74.7 | 72.9 | 67.0 | 83.4 | 70.1 | 56.6 | 65.0 | 73.3 |
RT-DETR [36] | SwinTransformer [68] | 37.1 M | 99.2 G | 140 | 80.5 | 77.2 | 68.5 | 86.0 | 70.1 | 57.6 | 69.0 | 75.6 |
RT-DETR [36] | RevColV1 [69] | 69.3 M | 173.1 G | 89 | 82.5 | 78.0 | 68.7 | 88.0 | 72.6 | 59.5 | 71.3 | 77.7 |
RT-DETR [36] | ConvNeXtV2 [70] | 12.7 M | 33.1 G | 243 | 75.8 | 73.1 | 66.2 | 83.0 | 67.5 | 58.8 | 68.2 | 74.4 |
RT-DETR [36] | TransNeXt [71] | 21.5 M | 65.7 G | 184 | 80.3 | 78.2 | 66.9 | 87.4 | 71.7 | 58.6 | 70.0 | 77.3 |
RT-DETR [36] | VanillaNet [72] | 28.1 M | 116.7 G | 105 | 81.2 | 79.0 | 66.5 | 87.5 | 71.7 | 58.2 | 70.4 | 77.9 |
RT-DETR [36] | RepViT [73] | 13.8 M | 38.3 G | 230 | 76.5 | 72.6 | 65.7 | 82.4 | 68.6 | 57.0 | 67.6 | 74.2 |
RT-DETR [36] | CSWinTransformer [74] | 32.3 M | 91.3 G | 113 | 79.8 | 77.0 | 68.1 | 85.5 | 70.2 | 57.7 | 68.8 | 75.3 |
RT-DETR [36] | FasterNet [75] | 22.1 M | 56.5 G | 213 | 80.0 | 77.3 | 67.2 | 86.3 | 70.5 | 57.4 | 68.7 | 76.5 |
RT-DETR [36] | UniRepLknet [76] | 13.3 M | 35.2 G | 227 | 76.0 | 72.3 | 65.5 | 82.6 | 68.2 | 57.0 | 67.9 | 74.0 |
RT-DETR [36] | EfficientFormerV2 [77] | 12.2 M | 30.6 G | 236 | 75.6 | 72.0 | 65.7 | 82.5 | 68.1 | 56.5 | 67.5 | 74.1 |
RT-DETR [36] | EMO [78] | 13.5 M | 28.6 G | 240 | 75.7 | 72.2 | 65.4 | 82.1 | 68.2 | 56.5 | 67.7 | 74.3 |
RT-DETR [36] | Mamba-T | 9.1 M | 17.2 G | 375 | 80.0 | 76.3 | 68.1 | 86.3 | 70.4 | 57.6 | 68.0 | 75.1 |
RT-DETR [36] | Mamba-B | 23.6 M | 48.4 G | 232 | 81.7 | 78.5 | 68.0 | 87.7 | 72.0 | 59.4 | 71.6 | 77.3 |
RT-DETR [36] | Mamba-L | 57.8 M | 147.4 G | 97 | 83.2 | 79.5 | 68.7 | 89.5 | 73.0 | 61.2 | 72.7 | 79.3 |
Efficiency | Average Accuracy | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Encoder | Params | FLOPs | FPS | Precision | Recall | ||||||
FVRT-DETR-T (Baseline) | PA-FPN [41] | 9.1 M | 17.2 G | 375 | 80.0 | 76.3 | 68.1 | 86.3 | 70.4 | 57.6 | 68.0 | 75.1 |
FVRT-DETR-B (Baseline) | PA-FPN [41] | 23.6 M | 48.4 G | 232 | 81.7 | 78.5 | 68.0 | 87.7 | 72.0 | 59.4 | 71.6 | 77.3 |
FVRT-DETR-L (Baseline) | PA-FPN [41] | 57.8 M | 147.4 G | 97 | 83.2 | 79.5 | 68.7 | 89.5 | 73.0 | 61.2 | 72.7 | 79.3 |
FVRT-DETR-T | Bi-FPN [79] | 9.9 M | 24.5 G | 367 | 80.1 | 76.8 | 68.4 | 86.1 | 70.3 | 57.0 | 68.2 | 75.5 |
FVRT-DETR-B | Bi-FPN [79] | 24.8 M | 54.5 G | 205 | 82.0 | 78.6 | 69.2 | 87.5 | 72.3 | 59.7 | 71.4 | 77.4 |
FVRT-DETR-L | Bi-FPN [79] | 58.4 M | 153.8 G | 90 | 83.5 | 79.7 | 68.1 | 89.8 | 72.9 | 60.5 | 72.6 | 79.3 |
FVRT-DETR-T | HS-FPN [80] | 8.0 M | 18.5 G | 362 | 79.9 | 76.8 | 68.0 | 86.5 | 70.1 | 56.9 | 68.4 | 75.1 |
FVRT-DETR-B | HS-FPN [80] | 22.8 M | 49.6 G | 227 | 81.5 | 77.9 | 68.4 | 86.8 | 71.5 | 59.1 | 71.2 | 76.8 |
FVRT-DETR-L | HS-FPN [80] | 56.5 M | 149.2 G | 86 | 83.0 | 79.0 | 68.7 | 88.7 | 73.0 | 60.7 | 72.4 | 78.9 |
FVRT-DETR-T | MDFF | 17.0 M | 91.5 G | 170 | 82.5 | 77.3 | 68.9 | 87.2 | 70.9 | 58.4 | 70.2 | 77.9 |
FVRT-DETR-B | MDFF | 27.1 M | 145.8 G | 95 | 83.6 | 78.3 | 70.0 | 88.3 | 72.1 | 59.2 | 71.1 | 78.8 |
FVRT-DETR-L | MDFF | 44.6 M | 270.1 G | 63 | 84.6 | 80.4 | 71.6 | 90.4 | 74.6 | 62.5 | 74.0 | 80.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bu, X.; Wu, Y.; Lv, H.; Yu, Y. Fast and Accurate Detection of Forty Types of Fruits and Vegetables: Dataset and Method. Agriculture 2025, 15, 760. https://doi.org/10.3390/agriculture15070760
Bu X, Wu Y, Lv H, Yu Y. Fast and Accurate Detection of Forty Types of Fruits and Vegetables: Dataset and Method. Agriculture. 2025; 15(7):760. https://doi.org/10.3390/agriculture15070760
Chicago/Turabian StyleBu, Xiaosheng, Yongfeng Wu, Hongtai Lv, and Youling Yu. 2025. "Fast and Accurate Detection of Forty Types of Fruits and Vegetables: Dataset and Method" Agriculture 15, no. 7: 760. https://doi.org/10.3390/agriculture15070760
APA StyleBu, X., Wu, Y., Lv, H., & Yu, Y. (2025). Fast and Accurate Detection of Forty Types of Fruits and Vegetables: Dataset and Method. Agriculture, 15(7), 760. https://doi.org/10.3390/agriculture15070760