Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Image Dataset
2.2. The Proposed Method (YOLO-GBS)
2.2.1. Global Context Attention Mechanism
2.2.2. Multi-Scale Feature Fusion
2.2.3. Swin Transformer
2.2.4. Additional Detection Head
2.3. Experiment Environment and Model Evaluation
3. Results
3.1. Ablation Studies
3.2. Comparison of Various Mainstream Networks
3.3. Model Generalization Capability
3.4. Grad-CAM Visualisation
4. Discussion
5. Conclusions
- Based on the self-made rice pest data set with seven categories, the mean average precision of the improved YOLO-GBS target detection algorithm is 79.8%, which is 5.4% higher than the original YOLOv5s. It can also achieve better detection results in complex scenes.
- By comparing the improved YOLO-GBS with common target detection algorithms such as YOLOv3, Faster RCNN, SSD, etc., the results show that YOLO-GBS has excellent performance in detection accuracy and time. It has an incredibly good comprehensive performance, meeting the requirements of real-time detection accuracy and the speed of rice pests.
- This study discusses the detection performance of YOLO-GBS on large-scale pest data sets. The experimental results show that the improved model has good robustness and generalization performance, with the possibility of further applications to other crop pest detection.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Peng, S.; Tang, Q.; Zou, Y. Current Status and Challenges of Rice Production in China. Plant Prod. Sci. 2009, 12, 3–8. [Google Scholar] [CrossRef] [Green Version]
- Al-Hiary, H.; Bani-Ahmad, S.; Reyalat, M.; Braik, M.; Alrahamneh, Z. Fast and accurate detection and classification of plant diseases. Int. J. Comput. Appl. 2011, 17, 31–38. [Google Scholar] [CrossRef]
- Yaakob, S.N.; Jain, L. An insect classification analysis based on shape features using quality threshold ARTMAP and moment invariant. Appl. Intell. 2012, 37, 12–30. [Google Scholar] [CrossRef]
- Yao, Q.; Lv, J.; Liu, Q.-j.; Diao, G.-q.; Yang, B.-j.; Chen, H.-m.; Tang, J. An Insect Imaging System to Automate Rice Light-Trap Pest Identification. J. Integr. Agric. 2012, 11, 978–985. [Google Scholar] [CrossRef]
- Wang, J.; Lin, C.; Ji, L.; Liang, A. A new automatic identification system of insect images at the order level. Knowl. Based Syst. 2012, 33, 102–110. [Google Scholar] [CrossRef]
- Hu, Y.-Q.; Song, L.-T.; Zhang, J.; Xie, C.-J.; Li, R. Pest image recognition of multi-feature fusion based on sparse representation. Pattern Recognit. Artif. Intell. 2014, 27, 985–992. [Google Scholar]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Shen, Y.; Zhou, H.; Li, J.; Jian, F.; Jayas, D.S. Detection of stored-grain insects using deep learning. Comput. Electron. Agric. 2018, 145, 319–325. [Google Scholar] [CrossRef]
- Dai, F.; Wang, F.; Yang, D.; Lin, S.; Chen, X.; Lan, Y.; Deng, X. Detection Method of Citrus Psyllids With Field High-Definition Camera Based on Improved Cascade Region-Based Convolution Neural Networks. Front. Plant Sci. 2022, 12, 816272. [Google Scholar] [CrossRef]
- Rong, M.; Wang, Z.; Ban, B.; Guo, X. Pest Identification and Counting of Yellow Plate in Field Based on Improved Mask R-CNN. Discret. Dyn. Nat. Soc. 2022, 2022, e1913577. [Google Scholar] [CrossRef]
- Sethy, P.K.; Dash, S.; Barpanda, N.K.; Rath, A.K. A Novel Approach for Quantification of Population Density of Rice Brown Plant Hopper (RBPH) Using On-Field Images Based On Image Processing. J. Emerg. Technol. Innov. Res. JETIR 2019, 6, 252–256. [Google Scholar]
- Qing, Y.; Ding-xiang, X.; Qing-jie, L.; Bao-jun, Y.; Guang-qiang, D.; Jian, T. Automated Counting of Rice Planthoppers in Paddy Fields Based on Image Processing. J. Integr. Agric. 2014, 13, 1736–1745. [Google Scholar]
- Liu, Z.; Gao, J.; Yang, G.; Zhang, H.; He, Y. Localization and Classification of Paddy Field Pests using a Saliency Map and Deep Convolutional Neural Network. Sci. Rep. 2016, 6, 20410. [Google Scholar] [CrossRef] [Green Version]
- He, Y.; Zhou, Z.; Tian, L.; Liu, Y.; Luo, X. Brown rice planthopper (Nilaparvata lugens Stal) detection based on deep learning. Precis. Agric. 2020, 21, 1385–1402. [Google Scholar] [CrossRef]
- Wu, X.; Zhan, C.; Lai, Y.-K.; Cheng, M.-M.; Yang, J. IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8779–8788. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. arXiv 2019, arXiv:1904.11492. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. arXiv 2018, arXiv:1711.07971. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2019, arXiv:1709.01507. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 28 October 2019; pp. 9197–9206. [Google Scholar]
- Ghiasi, G.; Lin, T.-Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. arXiv 2020, arXiv:1911.09070. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. Int. Conf. Mach. Learn. 2021, 139, 10347–10357. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint 2020, arXiv:2010.04159. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Hudson, D.A.; Zitnick, L. Generative adversarial transformers. In Proceedings of the International conference on machine learning. PMLR, Virtual, 18–24 July 2021; pp. 4487–4499. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Wang, L.; Zhao, Y.; Liu, S.; Li, Y.; Chen, S.; Lan, Y. Precision Detection of Dense Plums in Orchards Using the Improved YOLOv4 Model. Front. Plant Sci. 2022, 13, 839269. [Google Scholar] [CrossRef] [PubMed]
- Tuda, M.; Luna-Maldonado, A.I. Image-Based Insect Species and Gender Classification by Trained Supervised Machine Learning Algorithms. Ecol. Inform. 2020, 60, 101135. [Google Scholar] [CrossRef]
Methods | Precision (%) | Recall (%) | mAP (%) |
---|---|---|---|
YOLOv5s (baseline) | 71.2 | 68.4 | 74.4 |
YOLOv5s + P6 | 71.9 | 70.7 | 75.1 |
YOLOv5s + P6 + BiFPN | 72.8 | 71.2 | 75.4 |
YOLOv5s + P6 + BiFPN + GC | 73.2 | 72.7 | 76.5 |
YOLO-GBS (previous + Swin Transformer) | 73.8 | 75.1 | 79.8 |
Methods | mAP (%) | Detection Time (ms) |
---|---|---|
SSD300 | 68.3 | 11.9 |
YOLOv3-tiny | 69.6 | 1.9 |
YOLOv3 | 74.6 | 7.6 |
Faster RCNN | 75.4 | 18.3 |
YOLO-GBS | 79.8 | 3.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, Y.; Deng, X.; Lan, Y.; Chen, X.; Long, Y.; Liu, C. Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects 2023, 14, 280. https://doi.org/10.3390/insects14030280
Hu Y, Deng X, Lan Y, Chen X, Long Y, Liu C. Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects. 2023; 14(3):280. https://doi.org/10.3390/insects14030280
Chicago/Turabian StyleHu, Yuqi, Xiaoling Deng, Yubin Lan, Xin Chen, Yongbing Long, and Cunjia Liu. 2023. "Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion" Insects 14, no. 3: 280. https://doi.org/10.3390/insects14030280
APA StyleHu, Y., Deng, X., Lan, Y., Chen, X., Long, Y., & Liu, C. (2023). Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects, 14(3), 280. https://doi.org/10.3390/insects14030280