CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics
Abstract
:1. Introduction
- Construction of a cabbage instance segmentation dataset: Cabbage images from the harvest period were collected using image acquisition devices and search engines. After rigorous selection and processing, 10,000 images were annotated, creating a high-quality cabbage head instance segmentation dataset.
- Integration of deformable attention into the C2f module: The C2f module was enhanced through the incorporation of deformable attention and dynamic sampling points, forming a deformable attention module. This enables the model to better adapt to varying image sizes and content, thereby improving accuracy and efficiency in instance segmentation tasks.
- Improvement of the downsampling process using the ADown module: The ADown module employs an adaptive mechanism to retain essential information and capture higher-level image features. The model efficiently handles objects of varying sizes through multi-scale feature fusion, improving both accuracy and robustness in instance segmentation tasks.
- Enhancement of small object segmentation using the SOEP module: The Small Object Enhance Pyramid (SOEP) applies Spatial-Depthwise Separable Convolution (SPDConv) to the P2 layer to extract richer small-object features. Combined with CSP-OmniKernel feature aggregation, SOEP effectively preserves critical small-object information, significantly improving segmentation performance for small objects.
2. Materials and Methods
2.1. Image Acquisition
2.2. Dataset Establishment
2.3. Network Structure
2.3.1. C2f Deformable Attention Block
2.3.2. ADown Block
2.3.3. Small Object Enhance Pyramid
3. Experiments and Results
3.1. Experimental Environment
3.2. Evaluation Metrics
3.3. Comparison Experiments
3.4. Ablation Experiments
3.5. Visualization and Analysis of Experiments
4. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tong, W.Y.; Zhang, J.F.; Song, Z.Y.; Cao, G.Q.; Jin, Y.; Ning, X.F. Research Status and Development Trend of Cabbage Mechanical Harvesting Equipment and Technology. J. Chin. Agric. Mech. 2024, 45, 322–329. [Google Scholar]
- Yang, J.H.; Fang, X.; Ma, L.X.; Zhou, C.; Shao, C.F. Research Status and Direction of Headed Vegetable Harvesting Machinery. J. Agric. Mechan. Res. 2023, 45, 10–17. [Google Scholar] [CrossRef]
- Yang, J.H.; Du, Y.G.; Fang, X.; Zhou, C. Design and Experimental study of Cabbage Picking and Conveying Device. J. Chin. Agric. Mech. 2024, 45, 32–36. [Google Scholar] [CrossRef]
- Ghazal, S.; Munir, A.; Qureshi, W.S. Computer vision in smart agriculture and precision farming: Techniques and applications. Artif. Intell. Agric. 2024, 13, 64–83. [Google Scholar] [CrossRef]
- Zou, L.L.; Liu, X.M.; Yuan, J.; Dong, X.H. Advances in Mechanized Harvesting Technology and Equipment for Leaf Vegetables. Chin. J. Agric. Mech. 2022, 43, 15–23. [Google Scholar] [CrossRef]
- Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic Segmentation of Agricultural Images: A Survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
- Charisis, C.; Argyropoulos, D. Deep Learning-Based Instance Segmentation Architectures in Agriculture: A Review of the Scopes and Challenges. Smart Agric. Technol. 2024, 8, 100448. [Google Scholar] [CrossRef]
- Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and Challenges of Image Segmentation: A Review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Venice, Italy, 2017. [Google Scholar] [CrossRef]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT: Real-time Instance Segmentation. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. arXiv 2023, arXiv:2301.00808. [Google Scholar]
- Li, Y.; Feng, Q.; Li, T.; Xie, F.; Liu, C.; Xiong, Z. Advance of Target Visual Information Acquisition Technology for Fresh Fruit Robotic Harvesting: A Review. Agronomy 2022, 12, 1336. [Google Scholar] [CrossRef]
- Kang, S.; Li, D.; Li, B.; Zhu, J.; Long, S.; Wang, J. Maturity Identification and Category Determination Method of Broccoli Based on Semantic Segmentation Models. Comput. Electron. Agric. 2024, 217, 108633. [Google Scholar] [CrossRef]
- Blok, P.M.; Barth, R.; van den Berg, W. Machine Vision for a Selective Broccoli Harvesting Robot. IFAC-PapersOnLine 2016, 49, 66–71. [Google Scholar] [CrossRef]
- Blok, P.M.; van Henten, E.J.; van Evert, F.K.; Kootstra, G. Image-Based Size Estimation of Broccoli Heads under Varying Degrees of Occlusion. Biosyst. Eng. 2021, 208, 213–233. [Google Scholar] [CrossRef]
- Kang, H.; Wang, X.; Chen, C. Geometry-Aware Fruit Grasping Estimation for Robotic Harvesting in Orchards. Comput. Electron. Agric. 2022, 193, 106716. [Google Scholar] [CrossRef]
- Shen, L.; Su, J.; Huang, R.; Quan, W.; Song, Y.; Fang, Y.; Su, B. Fusing Attention Mechanism with Mask R-CNN for Instance Segmentation of Grape Cluster in the Field. Front. Plant Sci. 2022, 13, 934450. [Google Scholar] [CrossRef]
- Wang, D.; He, D. Apple Detection and Instance Segmentation in Natural Environments Using an Improved Mask Scoring R-CNN Model. Front. Plant Sci. 2022, 13, 1016470. [Google Scholar] [CrossRef]
- Coll-Ribes, G.; Torres-Rodríguez, I.J.; Grau, A.; Guerra, E.; Sanfeliu, A. Accurate Detection and Depth Estimation of Table Grapes and Peduncles for Robot Harvesting, Combining Monocular Depth Estimation and CNN Methods. Comput. Electron. Agr. 2023, 215, 108362. [Google Scholar] [CrossRef]
- Lawal, O.M. YOLOv5-LiNet: A Lightweight Network for Fruits Instance Segmentation. PLoS ONE 2023, 18, e0282297. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Liu, C.; Xiong, Z.; Sun, Y.; Xie, F.; Li, T.; Zhao, C. MTA-YOLACT: Multitask-Aware Network on Fruit Bunch Identification for Cherry Tomato Robotic Harvesting. Eur. J. Agron. 2023, 146, 126812. [Google Scholar] [CrossRef]
- Lüling, N.; Reiser, D.; Griepentrog, H.W. Volume and Leaf Area Calculation of Cabbage with a Neural Network-Based Instance Segmentation. In Precision Agriculture ’21; Wageningen Academic Publishers: Budapest, Hungary, 2021; pp. 719–726. [Google Scholar] [CrossRef]
- Lüling, N.; Reiser, D.; Stana, A.; Griepentrog, H.W. Using Depth Information and Colour Space Variations for Improving Outdoor Robustness for Instance Segmentation of Cabbage. arXiv 2021, arXiv:2103.16923. [Google Scholar] [CrossRef]
- Lüling, N.; Reiser, D.; Straub, J.; Stana, A.; Griepentrog, H.W. Fruit Volume and Leaf-Area Determination of Cabbage by a Neural-Network-Based Instance Segmentation for Different Growth Stages. Sensors 2022, 23, 129. [Google Scholar] [CrossRef] [PubMed]
- Asano, M.; Onishi, K.; Fukao, T. Robust Cabbage Recognition and Automatic Harvesting under Environmental Changes. Adv. Robot. 2023, 37, 960–969. [Google Scholar] [CrossRef]
- Cong, P.; Li, S.; Zhou, J.; Lv, K.; Feng, H. Research on Instance Segmentation Algorithm of Greenhouse Sweet Pepper Detection Based on Improved Mask RCNN. Agronomy 2023, 13, 196. [Google Scholar] [CrossRef]
- Wu, H.; Guo, W.; Liu, C.; Sun, X. A Study of cabbage Recognition Based on Semantic Segmentation. Agronomy 2024, 14, 894. [Google Scholar] [CrossRef]
- Jia, W.; Li, Q.; Zhang, Z.; Liu, G.; Hou, S.; Ji, Z.; Zheng, Y. Optimized SOLO Segmentation Algorithm for the Green Fruits of Persimmons and Apples in Complex Environments. Trans. Chin. Soc. Agric. Eng. 2021, 37, 121–127. [Google Scholar]
- Sheng, X.; Kang, C.; Zheng, J.; Lyu, C. An edge-guided method to fruit segmentation in complex environments. Comput. Electron. Agric. 2023, 208, 107788. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics Yolov8. Available online: https://github.com/ultralytics/ultralytics (accessed on 12 June 2024).
- Advanced Auto Labeling Solution with Added Features. Available online: https://github.com/CVHub520/X-AnyLabeling (accessed on 13 June 2024).
- Xia, Z.; Pan, X.; Song, S.; Li, E.L.; Huang, G. DAT++: Spatially Dynamic Vision Transformer with Deformable Attention. arXiv 2023, arXiv:2309.01430. [Google Scholar]
- Xia, Z.; Pan, X.; Song, S.; Li, E.L.; Huang, G. Vision Transformer with Deformable Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4794–4803. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024. [Google Scholar] [CrossRef]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. arXiv 2022. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. arXiv 2019. [Google Scholar] [CrossRef]
- Cui, Y.; Ren, W.; Knoll, A. Omni-Kernel Network for Image Restoration. AAAI Conf. Artif. Intell. 2023, 38, 27907. [Google Scholar] [CrossRef]
- Gupta, K.; Shakya, S.; Singla, A. Efficient Graph-Friendly COCO Metric Computation for Train-Time Model Evaluation. arXiv 2022, arXiv:2207.12120. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef] [PubMed]
- Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; Wang, X. Mask Scoring R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. SOLO: Segmenting Objects by Locations. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. PointRend: Image Segmentation as Rendering. arXiv 2019, arXiv:1912.08193. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2: Dynamic and Fast Instance Segmentation. Adv. Neural Inf. Process. Syst. (NeurIPS) 2020, 33, 17721–17732. [Google Scholar]
- Fang, Y.; Yang, S.; Wang, X.; Li, Y.; Fang, C.; Shan, Y.; Feng, B.; Liu, W. Instances As Queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6910–6919. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar] [CrossRef]
- Zhang, C.; Han, D.; Qiao, Y.; Kim, J.U.; Bae, S.H.; Lee, S.; Hong, C.S. Faster Segment Anything: Towards Lightweight SAM for Mobile Applications. arXiv 2023, arXiv:2306.14289. [Google Scholar]
- Zhao, X.; Ding, W.; An, Y.; Du, Y.; Yu, T.; Li, M.; Tang, M.; Wang, J. Fast Segment Anything. arXiv 2023, arXiv:2306.12156. [Google Scholar]
- Jocher, G. Ultralytics Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 27 June 2024).
Models | Mask AP0.5:0.95/% | Mask AP0.5/% | Mask AP0.75/% | Mask APsmall/% | Params/M | GFLOPS | FPS |
---|---|---|---|---|---|---|---|
Mask R-CNN | 74.1 | 90.8 | 80.2 | 36.3 | 43.9 | 284.6 | 21 |
Cascade Mask R-CNN [39] | 74.5 | 90.2 | 80.4 | 35.5 | 76.8 | 3323.2 | 12 |
Mask Scoring R-CNN [40] | 75.2 | 89.8 | 80.5 | 35.3 | 60.4 | 366.6 | 23 |
Hybrid Task Cascade [41] | 75.9 | 91.7 | 81.8 | 38.0 | 79.9 | 3506.0 | 6 |
YOLACT | 68.2 | 89.4 | 73.6 | 31.0 | 34.7 | 163.2 | 21 |
SOLO [42] | 58.2 | 78.1 | 62.8 | 7.0 | 35.9 | 311.5 | 11 |
PointRend [43] | 73.3 | 90.5 | 79.6 | 35.1 | 55.9 | 184.1 | 18 |
SOLOv2 [44] | 49.3 | 76.1 | 50.7 | 7.8 | 46.0 | 276.7 | 12 |
QueryInst [45] | 68.4 | 85.0 | 74.6 | 24.4 | 172.2 | 135.6 | 10 |
ConvNeXt-V2 | 75.8 | 89.0 | 80.5 | 34.3 | 108.1 | 469.6 | 6 |
CabbageNet (Ours) | 78.9 | 94.0 | 85.4 | 38.7 | 3.21 | 15.1 | 154 |
Models | IoU/% | Dice/% | F1/% | Hausdorff Distance/Pixels | PA/% | FPS |
---|---|---|---|---|---|---|
SAM (base) [46] | 70.2 | 74.2 | 74.1 | 73.8 | 96.8 | - |
SAM2 (2.1 base) [47] | 49.9 | 51.9 | 51.9 | 175.4 | 94.9 | - |
MobileSAM [48] | 76.1 | 80.3 | 80.3 | 58.45 | 97.2 | - |
FastSAM (s) [49] | 71.3 | 76.6 | 76.6 | 80.3 | 96.9 | 63 |
CabbageNet (Ours) | 85.3 | 90.0 | 90.0 | 27.1 | 99.4 | 154 |
Models | Mask Precision/% | Mask Recall/% | Mask mAP50/% | Mask mAP50-95/% | Params/M | FPS | GFLOPs | Model Size/MB |
---|---|---|---|---|---|---|---|---|
YOLOv5n-seg [50] | 90.5 | 88.5 | 94.6 | 76.5 | 1.88 | 329 | 6.7 | 3.96 |
YOLOv9c-seg | 90.6 | 89.5 | 95.4 | 82.5 | 27.63 | 22 | 157.6 | 56.30 |
YOLOv8n-seg (Baseline) | 90.9 | 88.0 | 94.9 | 80.4 | 3.26 | 215 | 12.0 | 6.50 |
CabbageNet (Ours) | 92.2 | 87.2 | 95.1 | 80.6 | 3.21 | 154 | 15.1 | 6.46 |
Models | C2f-DAttention | ADown | SOEP | Mask Precision/% | Mask Recall/% | Mask mAP50/% | Mask mAP50-95/% | Params/M | FPS | GFLOPs | Size/MB |
---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv8n-seg (Baseline) | – | – | – | 90.9 | 88.0 | 94.9 | 80.4 | 3.26 | 215 | 12.0 | 6.50 |
+ C2f-DAttention | ✓ | – | – | 92.0 | 86.7 | 94.9 | 80.5 | 3.32 | 193 | 12.0 | 6.64 |
+ C2f-DAttention + ADown | ✓ | ✓ | – | 91.6 | 87.2 | 95.1 | 80.6 | 2.91 | 162 | 11.3 | 5.87 |
CabbageNet (Ours) | ✓ | ✓ | ✓ | 92.2 | 87.2 | 95.1 | 80.6 | 3.21 | 154 | 15.1 | 6.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, Y.; Cao, X.; Zhang, T.; Wu, H.; Zhao, C.; Zhao, Y. CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics. Sensors 2024, 24, 8115. https://doi.org/10.3390/s24248115
Tian Y, Cao X, Zhang T, Wu H, Zhao C, Zhao Y. CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics. Sensors. 2024; 24(24):8115. https://doi.org/10.3390/s24248115
Chicago/Turabian StyleTian, Yongqiang, Xinyu Cao, Taihong Zhang, Huarui Wu, Chunjiang Zhao, and Yunjie Zhao. 2024. "CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics" Sensors 24, no. 24: 8115. https://doi.org/10.3390/s24248115
APA StyleTian, Y., Cao, X., Zhang, T., Wu, H., Zhao, C., & Zhao, Y. (2024). CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics. Sensors, 24(24), 8115. https://doi.org/10.3390/s24248115