From Convolutional Networks to Vision Transformers: Evolution of Deep Learning in Agricultural Pest and Disease Identification
Abstract
:1. Introduction
2. Methods Based on Conventional Machine Learning
3. Methods Based on Deep Learning
3.1. Data Collection and Preprocessing
3.2. Models
3.2.1. CNN-Based Models
3.2.2. Vision Transformer-Based Models
3.2.3. CNN–Transformer Hybrid Models
4. Discussion
4.1. Small Sample Size of Data
4.2. Diversity of Pest and Disease Patterns
4.3. Adaptation Challenges in Complex Backgrounds
4.4. Model Demands for Lightweight and Real-Time Performance
5. Conclusions
Funding
Acknowledgments
Conflicts of Interest
References
- Zhang, X.; Yin, H.; Zhuang, C.; Ren, G.; Li, C.; Yang, Q.; Zhou, Y.; Feng, B. Current Situation and Analysis of the Standardization of Crop Pests, Diseases and Weeds. Stand. Sci. 2024, (Suppl. S2), 132–139. [Google Scholar]
- Li, Z.; Li, B.; Li, Z.; Zhan, Y.; Wang, L.; Gong, Q. Research progress in crop disease and pest identification based on deep learning. Hubei Agric. Sci. 2023, 62, 165–169. [Google Scholar]
- Wang, B.; Yang, M.; Cao, P.; Liu, Y. A novel embedded cross framework for high-resolution salient object detection. Appl. Intell. 2025, 55, 277. [Google Scholar] [CrossRef]
- Cai, H.; Wang, Y.; Luo, Y.; Mao, K. A Dual-Channel Collaborative Transformer for continual learning. Appl. Soft Comput. 2025, 171, 112792. [Google Scholar] [CrossRef]
- Saro, J.; Kavit, A. Review: Study on simple K-mean and modified K-mean clustering technique. Int. J. Comput. Sci. Eng. Technol. 2016, 6, 279–281. [Google Scholar]
- Suykens, J.A.K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
- Jin, Y. Image Recognition of Four Kinds of Fruit Tree Diseases. Master’s Thesis, Liaoning Normal University, Dalian, China, 2021. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 2, 273–297. [Google Scholar] [CrossRef]
- Pearson, K. On lines and planes of closest fit to systems of points in space. Philos. Mag. 1901, 2, 559–572. [Google Scholar] [CrossRef]
- Pantazi, X.E.; Moshou, D.; Tamouridou, A.A. Automated leaf disease detection in different crop species through image features analysis and One Class Classifiers. Comput. Electron. Agric. 2019, 156, 96–104. [Google Scholar] [CrossRef]
- Li, Y. Research on Image Segmentation Algorithm Based on Superpixel and Graph Theory. Master’s Thesis, Chongqing University, Chongqing, China, 2020. [Google Scholar]
- Liu, H.; Zhu, S.; Shen, Y.; Tang, J. Fast Segmentation Algorithm of Tree Trunks Based on Multi-feature Fusion. Trans. Chin. Soc. Agric. Mach. 2020, 51, 221–229. [Google Scholar]
- Zhang, N. Color Image Segmentation Based on Contrast and GrabCut. Master’s Thesis, Hebei University, Baoding, China, 2018. [Google Scholar]
- Li, K.; Feng, Q.; Zhang, J. Co-Segmentation Algorithm for Complex Background Image of Cotton Seedling Leaves. J. Comput.-Aided Des. Comput. Graph. 2017, 29, 1871–1880. [Google Scholar]
- Lei, Y.; Han, D.; Zeng, Q.; He, D. Grading Method of Disease Severity of Wheat Stripe Rust Based on Hyperspectral Imaging Technology. Trans. Chin. Soc. Agric. Mach. 2018, 49, 226–232. [Google Scholar]
- Pan, Y.; Zhang, H.; Yan, J.; Zhang, H. Source Identification of Radix Glycyrrhizae(Licorice) Based on the Fusion of Hyperspectral and Texture Feature. J. Instrum. Anal. 2024, 43, 1745–1753. [Google Scholar]
- Dong, C.; Yang, T.; Chen, Q.; Liu, L.; Xiao, X.; Wei, Z.; Shi, C.; Shao, Y.; Gao, D. Application of hyperspectral imaging technology in non-destructive detection of apple quality. J. Fruit Sci. 2024, 41, 2582–2594. [Google Scholar]
- Mei, X.; Hu, Y.; Zhang, H.; Cai, Y.; Luo, K.; Meng, Y.; Song, Y.; Shan, W. Evaluation of drought status of potato leavesbased on hyperspectral imaging. Agric. Res. Arid. Areas 2024, 42, 246–254. [Google Scholar]
- Liu, Y.; Yin, Y.; Liu, J.; Yin, Y.; Chu, T.; Jiang, Y. Characterization and identification of weeds in the field using hyperspectral imaging. J. Tarim Univ. 2024, 36, 89–97. [Google Scholar]
- Singh, V. Sunflower leaf diseases detection using image segmentation based on particle swarm optimization. Artif. Intell. Agric. 2019, 3, 62–68. [Google Scholar] [CrossRef]
- Zhang, Z.; Li, M.; Shen, Z.; Chen, C.; Fang, S.; Du, K.; Yang, L.; Deng, T. Research on the Number of Crack in Wood Based on Acoustic Emission. For. Eng. 2025, 41, 59–66. [Google Scholar]
- Qiao, X.; Pan, X.; Wang, X.; Peng, J.; Zhao, X. Image segmentation of potato pests and diseases based on g-r component and k-means. J. Inn. Mong. Agric. Univ. (Nat. Sci. Ed.) 2021, 42, 84–87. [Google Scholar]
- Zhou, R.; Xi, J.; Ding, Y.; Duan, J.; Qi, B.; Yao, T.; Dong, S.; Liu, Y.; Ding, C.; Yang, G.; et al. Research on Curing Tobacco Image Segmentation Based on K-means Clustering Algorithm. J. Anhui Agric. Sci. 2024, 52, 232–237. [Google Scholar]
- Li, H.; Liu, J.; Wu, K. Image segmentation based on K-means algorithm. Mod. Comput. 2024, 30, 49–51+91. [Google Scholar]
- Lyu, S.; Yang, H.; Fan, X. Image recognition technology for potato diseases and pests based on FT and superpixel fuzzy C-means clustering. Digit. Microgr. Imaging 2023, 3–5. [Google Scholar]
- Yuan, Q.; Deng, H.; Wang, X. Citrus disease and insect pest area segmentation based on superpixel fast fuzzy C-means clustering and support vector machine. J. Comput. Appl. 2021, 41, 563–570. [Google Scholar]
- Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
- Zhang, Y.; Wa, S.; Zhang, L.; Lv, C. Automatic plant disease detection based on tranvolution detection network with GAN modules using leaf images. Front. Plant Sci. 2022, 13, 875693. [Google Scholar] [CrossRef]
- Thenmozhi, K.; Reddy, U.S. Crop pest classification based on deep convolutional neural network and transfer learning. Comput. Electron. Agric. 2019, 164, 104906. [Google Scholar] [CrossRef]
- Abbas, A.; Jain, S.; Gour, M.; Vankudothu, S. Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar] [CrossRef]
- Hu, K.; Liu, Y.M.; Nie, J.; Zheng, X.; Zhang, W.; Liu, Y.; Xie, T. Rice pest identification based on multi-scale double-branch GAN-ResNet. Front. Plant Sci. 2023, 14, 1167121. [Google Scholar] [CrossRef]
- Bates, E.; Popović, M.; Marsh, C.; Clark, R.; Kovac, M.; Kocer, B.B. Leaf level Ash Dieback Disease Detection and Online Severity Estimation with UAVs. IEEE Access 2025, 13, 55499–55511. [Google Scholar] [CrossRef]
- Wang, Q.; Chen, J.; Song, Y.; LI, X.; Xu, W. Fusing visual quantified features for heterogeneous traffic flow prediction. Promet-Traffic Transp. 2024, 36, 1068–1077. [Google Scholar] [CrossRef]
- Chen, J.; Ye, H.; Ying, Z.; Sun, Y.; Xu, W. Dynamic Trend Fusion Module for Traffic Flow Prediction. arXiv 2025, arXiv:2501.10796. [Google Scholar] [CrossRef]
- Chen, J.; Pan, S.; Peng, W.; Xu, W. Bilinear Spatiotemporal Fusion Network: An efficient approach for traffic flow prediction. Neural Netw. 2025, 187, 107382. [Google Scholar] [CrossRef] [PubMed]
- Rahman, C.R.; Arko, P.S.; Ali, M.E.; Khan, M.A.I.; Apon, S.H.; Nowrin, F.; Wasif, A. Identification and recognition of rice diseases and pests using convolutional neural networks. Biosyst. Eng. 2020, 194, 112–120. [Google Scholar] [CrossRef]
- Sangha, H.S.; Darr, M.J. Influence of Model Size and Image Augmentations on Object Detection in Low-Contrast Complex Background Scenes. AI 2025, 6, 52. [Google Scholar] [CrossRef]
- Xu, C.; Yu, C.; Zhang, S.; Wang, X. Multi-scale convolution-capsule network for crop insect pest recognition. Electronics 2022, 11, 1630. [Google Scholar] [CrossRef]
- Thakur, P.S.; Sheorey, T.; Ojha, A. VGG-ICNN: A Lightweight CNN model for crop disease identification. Multimed. Tools Appl. 2023, 82, 497–520. [Google Scholar] [CrossRef]
- Gong, X.; Zhang, S. A high-precision detection method of apple leaf diseases using improved faster R-CNN. Agriculture 2023, 13, 240. [Google Scholar] [CrossRef]
- Du, L.; Sun, Y.; Chen, S.; Feng, J.; Zhao, Y.; Yan, Z.; Zhang, X.; Bian, Y. A novel object detection model based on faster R-CNN for spodoptera frugiperda according to feeding trace of corn leaves. Agriculture 2022, 12, 248. [Google Scholar] [CrossRef]
- Rong, M.; Wang, Z.; Ban, B.; Guo, X. Pest Identification and Counting of Yellow Plate in Field Based on Improved Mask R-CNN. Discret. Dyn. Nat. Soc. 2022, 2022, 1913577. [Google Scholar] [CrossRef]
- Liu, S.; Fu, S.; Hu, A.; Ma, P.; Hu, X.; Tian, X.; Zhang, H.; Liu, S. Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN. Agronomy 2023, 13, 2155. [Google Scholar] [CrossRef]
- Lee, M.G.; Cho, H.B.; Youm, S.K.; Kim, S.W. Detection of pine wilt disease using time series UAV imagery and deep learning semantic segmentation. Forests 2023, 14, 1576. [Google Scholar] [CrossRef]
- Li, K.R.; Duan, L.J.; Deng, Y.J.; Liu, J.L.; Long, C.F.; Zhu, X.H. Pest Detection Based on Lightweight Locality-Aware Faster R-CNN. Agronomy 2024, 14, 2303. [Google Scholar] [CrossRef]
- Dong, Q.; Sun, L.; Han, T.; Cai, M.; Gao, C. PestLite: A novel YOLO-based deep learning technique for crop pest detection. Agriculture 2024, 14, 228. [Google Scholar] [CrossRef]
- Liu, J.; Wang, X. Tomato diseases and pests detection based on improved Yolo V3 convolutional neural network. Front. Plant Sci. 2020, 11, 898. [Google Scholar] [CrossRef]
- Soeb, M.J.A.; Jubayer, M.F.; Tarin, T.A.; AI Mamun, M.R.; Ruhad, F.M.; Parven, A.; Mubarak, N.M.; Karri, S.L.; Meftaul, I.M. Tea leaf disease detection and identification based on YOLOv7 (YOLO-T). Sci. Rep. 2023, 13, 6078. [Google Scholar] [CrossRef]
- Wang, L.; Shi, W.; Tang, Y.; Liu, Z.; He, X.; Xiao, H.; Yang, Y. Transfer Learning-Based Lightweight SSD Model for Detection of Pests in Citrus. Agronomy 2023, 13, 1710. [Google Scholar] [CrossRef]
- Türkoğlu, M.; Hanbay, D. Plant disease and pest detection using deep learning-based features. Turk. J. Electr. Eng. Comput. Sci. 2019, 27, 1636–1651. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wu, J.; Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Su, H.; Yang, C. DS-DETR: A model for tomato leaf disease segmentation and damage evaluation. Agronomy 2022, 12, 2023. [Google Scholar] [CrossRef]
- Chen, D.; Lin, J.; Wang, H.; Wu, K.; Lu, Y.; Zhou, X.; Zhang, J. Pest detection model based on multi-scale dataset. Trans. Chin. Soc. Agric. Eng. 2024, 40, 196–206. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Jiao, L.; Dong, S.; Zhang, S.; Xie, C.; Wang, H. AF-RCNN: An anchor-free convolutional neural network for multi-categories agricultural pest detection. Comput. Electron. Agric. 2020, 174, 105522. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, L.; Yuan, Y. Multimodal fine-grained transformer model for pest recognition. Electronics 2023, 12, 2620. [Google Scholar] [CrossRef]
- Cap, Q.H.; Uga, H.; Kagiwada, S.; Iyatomi, H. Leafgan: An effective data augmentation method for practical plant disease diagnosis. IEEE Trans. Autom. Sci. Eng. 2020, 19, 1258–1267. [Google Scholar] [CrossRef]
- Jiang, Y.; Chang, S.; Wang, Z. Transgan: Two pure transformers can make one strong gan, and that can scale up. Adv. Neural Inf. Process. Syst. 2021, 34, 14745–14758. [Google Scholar]
- Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12175–12185. [Google Scholar]
- Shen, Z.; Fu, R.; Lin, C.; Zheng, S. COTR: Convolution in transformer network for end to end polyp detection. In Proceedings of the 2021 7th International Conference on Computer and Communications (ICCC), Chengdu, China, 10–13 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1757–1761. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 10012–10022. [Google Scholar]
- Zhang, T.; Li, K.; Chen, X.; Zhong, C.; Luo, B.; Grijalva, I.; McCornack, B.; Flippo, D.; Sharda, A.; Wang, G.; et al. Aphid cluster recognition and detection in the wild using deep learning models. Sci. Rep. 2023, 13, 13410. [Google Scholar] [CrossRef] [PubMed]
- Chang, Z.; Xu, M.; Wei, Y.; Lian, J.; Zhang, C.; Li, C. UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images. Sensors 2024, 24, 6655. [Google Scholar] [CrossRef]
- Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. Coatnet: Marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar]
- Li, Y.; Wu, C.Y.; Fan, H.; Mangalam, K.; Xiong, B.; Malik, J.; Feichtenhofer, C. Mvitv2: Improved multiscale vision transformers for classification and detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–22 June 2022; pp. 4804–4814. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Zhang, M.; Yang, W.; Chen, D.; Fu, C.; Wei, F. AM-MSFF: A Pest Recognition Network Based on Attention Mechanism and Multi-Scale Feature Fusion. Entropy 2024, 26, 431. [Google Scholar] [CrossRef] [PubMed]
- Yang, R.; Guo, Y.; Hu, Z.; Gao, R.; Yang, H. Semantic segmentation of cucumber leaf disease spots based on ECA-SegFormer. Agriculture 2023, 13, 1513. [Google Scholar] [CrossRef]
- Li, X.; Li, S. Transformer help CNN see better: A lightweight hybrid apple disease identification model based on transformers. Agriculture 2022, 12, 884. [Google Scholar] [CrossRef]
- Dixit, A.K.; Verma, R. Advanced hybrid model for multi paddy diseases detection using deep learning. EAI Endorsed Trans. Pervasive Health Technol. 2023, 9. [Google Scholar] [CrossRef]
- Yu, H.; Song, J.; Chen, C.; Heidari, A.A.; Liu, J.; Chen, H.; Zaguia, A.; Mafarj, M. Image segmentation of Leaf Spot Diseases on Maize using multi-stage Cauchy-enabled grey wolf algorithm. Eng. Appl. Artif. Intell. 2022, 109, 104653. [Google Scholar] [CrossRef]
- Hong, S.J.; Nam, I.; Kim, S.Y.; Kim, E.; Lee, C.H.; Ahn, S.; Park, I.K.; Kim, G. Automatic pest counting from pheromone trap images using deep learning object detectors for matsucoccus thunbergianae monitoring. Insects 2021, 12, 342. [Google Scholar] [CrossRef]
- Kang, J.; Zhang, W.; Xia, Y.; Liu, W. A Study on Maize Leaf Pest and Disease Detection Model Based on Attention and Multi-Scale Features. Appl. Sci. 2023, 13, 10441. [Google Scholar] [CrossRef]
- Wu, X.; Fan, X.; Luo, P.; Choudhury, S.D.; Tjahjadi, T.; Hu, C. From laboratory to field: Unsupervised domain adaptation for plant disease recognition in the wild. Plant Phenomics 2023, 5, 0038. [Google Scholar] [CrossRef]
References | Crops | Model Features | Performance and Restriction |
---|---|---|---|
[10,11,12,13,14] | 18 crops |
|
|
[15,16,17,18,19] | Wheat |
|
|
[20,21] | Sunflower (Helianthus annuus) |
|
|
[22,23,24] | Potatoes |
|
|
[25,26] | Potatoes |
|
|
Model | Display Speed (FPS) | Accuracy (mAP) | Resource Usage (Video Memory /Training Time) |
---|---|---|---|
R-CNN | 0.3–1.5 Slowest (low FPS) | 62–68% | High video memory usage Long training time |
Fast R-CNN | 2–5 Medium (medium FPS) | 70–76% | Lower memory usage and shorter training time |
Faster R-CNN | 5–12 Fast (high FPS) | 78–85% | Higher memory footprint, end-to-end training and short inference time |
Algorithms | Categories | Performance Indicators | Characters | Limitations | |
---|---|---|---|---|---|
mAP | FPS | ||||
YOLO | Single-stage algorithm | 72.4–80.3% | 45–120 | High real-time performance, simple deployment and low demand for computing resources | Low detection accuracy of small objects and missed detection of dense objects |
Faster R-CNN | Two-stage algorithm | 78.2–85.6% | 5–12 | High detection accuracy and adaptability to complex scenes | Slow speeds and high computing resource requirements |
SSD | Single-stage algorithm | 68.9–76.8% | 22–40 | Multi-scale detection that balances speed and accuracy | Weaker detection of small objects than RetinaNet |
RetinaNet | Single-stage algorithm | 75.1–83.5% | 10–18 | Small object detection superiority and can address category imbalances | Lower speed than YOLO and higher memory usage |
Algorithm | Key Improvement Points |
---|---|
YOLOv3 | Introduction of multi-scale detection and residual networks (Darknet-53) |
YOLOv4 | Introduction of GloU loss function to optimise the backbone network and data enhancement strategies |
YOLOv5 | Lightweight design and modular structure with adaptive learning mechanism |
YOLOv6 | Industrial grade optimised and heavily parameterised design |
YOLOv7 | Introduction of decoupled head, dynamic training strategies and efficient module fusion |
Dimension | Transformer | CNN |
---|---|---|
Feature extraction mechanism | Global dependency modelling | Local feature capture |
Computational complexity | O(n2) and high computational cost with long sequences | O(n) and pooling operations can further reduce the amount of computation |
Space complexity | Higher, where the space complexity of the attention matrix is O(n2) | Relatively low |
Data requirements | Large-scale data | Medium-sized data |
Parallelisation capability | High, full-sequence parallel computing | Higher, based on sliding windows and partially parallelised |
Robustness to occlusion/noise | Higher and compensates for occluded areas with contextual information | Relatively low and sensitive to localised occlusion |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, M.; Liu, C.; Li, Z.; Yin, B. From Convolutional Networks to Vision Transformers: Evolution of Deep Learning in Agricultural Pest and Disease Identification. Agronomy 2025, 15, 1079. https://doi.org/10.3390/agronomy15051079
Zhang M, Liu C, Li Z, Yin B. From Convolutional Networks to Vision Transformers: Evolution of Deep Learning in Agricultural Pest and Disease Identification. Agronomy. 2025; 15(5):1079. https://doi.org/10.3390/agronomy15051079
Chicago/Turabian StyleZhang, Mengyao, Chaofan Liu, Zihan Li, and Baoquan Yin. 2025. "From Convolutional Networks to Vision Transformers: Evolution of Deep Learning in Agricultural Pest and Disease Identification" Agronomy 15, no. 5: 1079. https://doi.org/10.3390/agronomy15051079
APA StyleZhang, M., Liu, C., Li, Z., & Yin, B. (2025). From Convolutional Networks to Vision Transformers: Evolution of Deep Learning in Agricultural Pest and Disease Identification. Agronomy, 15(5), 1079. https://doi.org/10.3390/agronomy15051079