A Multimodal Parallel Transformer Framework for Apple Disease Detection and Severity Classification with Lightweight Optimization
Abstract
:1. Introduction
- A parallel transformer lesion segmentation network is introduced to process feature maps at multiple scales, extracting multigranular lesion features to improve the accuracy of recognizing similar diseases, such as apple scab and black rot.
- Multimodal data fusion leverages environmental data collected by sensors, enabling the model to account for variations in disease phenotypes across regions and climates, enhancing robustness in diverse environments.
- An automated acquisition workflow is designed to address the multidimensional and multi-angular characteristics of apple diseases. By extracting video frames from handheld devices and employing diffusion generation algorithms, complete apple surface image reconstruction is achieved, enabling comprehensive detection and precise grading.
2. Materials and Methods
2.1. Dataset Construction
2.1.1. Image Dataset Collection
2.1.2. Sensors Dataset Collection
2.1.3. Dataset Preprocessing
2.2. Multimodal Apple Disease Detection System
2.3. Multimodal Parallel Transformer Detection Network
2.4. Sensor-Image Data Fusion Module
2.5. Experimental Design
2.5.1. Experimental Environment
2.5.2. Baseline Methods
2.5.3. Evaluation Metrics
3. Results and Discussion
3.1. Apple Disease Classification Results
3.2. Analysis of Detection Accuracy Across Different Apple Diseases
3.3. Analysis of Apple Disease Severity Classification Results
3.4. Confidence Analysis of Experimental Results
3.5. t-SNE Validation Experiment for Consecutive Frame Diffusion Generation
3.6. Validation of Deployment on Handheld Mobile Devices
3.7. Cross-Field Generalization Test
3.8. Limitation and Feature Work
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mhamed, M.; Zhang, Z.; Yu, J.; Li, Y.; Zhang, M. Advances in apple’s automated orchard equipment: A comprehensive research. Comput. Electron. Agric. 2024, 221, 108926. [Google Scholar] [CrossRef]
- FAO. World Food and Agriculture—Statistical Yearbook; FAO: Rome, Italy, 2023. [Google Scholar] [CrossRef]
- Nabi, F.; Jamwal, S.; Padmanbh, K. Wireless sensor network in precision farming for forecasting and monitoring of apple disease: A survey. Int. J. Inf. Technol. 2022, 14, 769–780. [Google Scholar] [CrossRef]
- Zhang, W.; Zhou, G.; Chen, A.; Hu, Y. Deep multi-scale dual-channel convolutional neural network for Internet of Things apple disease detection. Comput. Electron. Agric. 2022, 194, 106749. [Google Scholar] [CrossRef]
- Shin, J.; Chang, Y.K.; Heung, B.; Nguyen-Quang, T.; Price, G.W.; Al-Mallahi, A. A deep learning approach for RGB image-based powdery mildew disease detection on strawberry leaves. Comput. Electron. Agric. 2021, 183, 106042. [Google Scholar] [CrossRef]
- Sharma, M.; Jindal, V. Approximation techniques for apple disease detection and prediction using computer enabled technologies: A review. Remote Sens. Appl. Soc. Environ. 2023, 32, 101038. [Google Scholar] [CrossRef]
- Logashov, D.; Shadrin, D.; Somov, A.; Pukalchik, M.; Uryasheva, A.; Gupta, H.P.; Rodichenko, N. Apple trees diseases detection through computer vision in embedded systems. In Proceedings of the 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
- Zhang, Y.; Wa, S.; Zhang, L.; Lv, C. Automatic plant disease detection based on tranvolution detection network with GAN modules using leaf images. Front. Plant Sci. 2022, 13, 875693. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Álvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Chen, J.; Mao, Q.; Liu, D. Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation. arXiv 2020, arXiv:2007.13975. [Google Scholar]
- Chen, C.; Fan, Q.; Panda, R. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021. [Google Scholar]
- Li, J.; Selvaraju, R.R.; Gotmare, A.D.; Joty, S.R.; Xiong, C.; Hoi, S.C.H. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. Adv. Neural Inf. Process. Syst. 2021, 34, 9694–9705. [Google Scholar]
- Wang, W.; Bao, H.; Dong, L.; Bjorck, J.; Peng, Z.; Liu, Q.; Aggarwal, K.; Mohammed, O.K.; Singhal, S.; Som, S.; et al. Image as a foreign language: Beit pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19175–19186. [Google Scholar]
- Zhang, Y.; He, N.; Yang, J.; Li, Y.; Wei, D.; Huang, Y.; Zhang, Y.; He, Z.; Zheng, Y. mmformer: Multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2022; pp. 107–117. [Google Scholar]
- Zhu, X.; Zhu, J.; Li, H.; Wu, X.; Wang, X.; Li, H.; Wang, X.; Dai, J. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Uryasheva, A.; Kalashnikova, A.; Shadrin, D.; Evteeva, K.; Moskovtsev, E.; Rodichenko, N. Computer vision-based platform for apple leaves segmentation in field conditions to support digital phenotyping. Comput. Electron. Agric. 2022, 201, 107269. [Google Scholar] [CrossRef]
- Gongal, A.; Silwal, A.; Amatya, S.; Karkee, M.; Zhang, Q.; Lewis, K. Apple crop-load estimation with over-the-row machine vision system. Comput. Electron. Agric. 2016, 120, 26–35. [Google Scholar] [CrossRef]
- Zhou, X.; Chen, S.; Ren, Y.; Zhang, Y.; Fu, J.; Fan, D.; Lin, J.; Wang, Q. Atrous Pyramid GAN Segmentation Network for Fish Images with High Performance. Electronics 2022, 11, 911. [Google Scholar] [CrossRef]
- Zhang, Y.; Wa, S.; Liu, Y.; Zhou, X.; Sun, P.; Ma, Q. High-accuracy detection of maize leaf diseases CNN based on multi-pathway activation function module. Remote Sens. 2021, 13, 4218. [Google Scholar] [CrossRef]
- Zhang, J.; He, L.; Karkee, M.; Zhang, Q.; Zhang, X.; Gao, Z. Branch detection for apple trees trained in fruiting wall architecture using depth features and Regions-Convolutional Neural Network (R-CNN). Comput. Electron. Agric. 2018, 155, 386–393. [Google Scholar] [CrossRef]
- Gao, F.; Fu, L.; Zhang, X.; Majeed, Y.; Li, R.; Karkee, M.; Zhang, Q. Multi-class fruit-on-plant detection for apple in SNAP system using Faster R-CNN. Comput. Electron. Agric. 2020, 176, 105634. [Google Scholar] [CrossRef]
- Kutyrev, A.; Kiktev, N.A.; Kalivoshko, O.; Rakhmedov, R.S. Recognition and Classification Apple Fruits Based on a Convolutional Neural Network Model. In Proceedings of the Information Technology and Implementation, Kyiv, Ukraine, 30 November–2 December 2022; pp. 90–101. [Google Scholar]
- Patrício, D.I.; Rieder, R. Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review. Comput. Electron. Agric. 2018, 153, 69–81. [Google Scholar] [CrossRef]
- Majeed, Y.; Zhang, J.; Zhang, X.; Fu, L.; Karkee, M.; Zhang, Q.; Whiting, M.D. Deep learning based segmentation for automated training of apple trees on trellis wires. Comput. Electron. Agric. 2020, 170, 105277. [Google Scholar] [CrossRef]
- Shrestha, G.; Das, M.; Dey, N. Plant disease detection using CNN. In Proceedings of the 2020 IEEE applied signal processing conference (ASPCON), Kolkata, India, 7–9 October 2020; IEEE: New York, NY, USA, 2020; pp. 109–113. [Google Scholar]
- Zhang, Y.; Lv, C. TinySegformer: A lightweight visual segmentation model for real-time agricultural pest detection. Comput. Electron. Agric. 2024, 218, 108740. [Google Scholar] [CrossRef]
- Xiao, B.; Nguyen, M.; Yan, W.Q. Apple ripeness identification from digital images using transformers. Multimed. Tools Appl. 2024, 83, 7811–7825. [Google Scholar] [CrossRef]
- Aslan, E.; Özüpak, Y. Diagnosis and accurate classification of apple leaf diseases using Vision transformers. Comput. Decis. Making Int. J. 2024, 1, 1–12. [Google Scholar] [CrossRef]
- Borhani, Y.; Khoramdel, J.; Najafi, E. A deep learning based approach for automated plant disease classification using vision transformer. Sci. Rep. 2022, 12, 11554. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Lan, Y.; Chen, X. CST: Convolutional Swin Transformer for detecting the degree and types of plant diseases. Comput. Electron. Agric. 2022, 202, 107407. [Google Scholar] [CrossRef]
- Raza, A.; Safdar, M.; Ali, H.; Iftikhar, M.; Ishfaqe, Q.; Al Ansari, M.S.; Wang, P.; Khan, A.S. Automated Plant Disease Detection: A Convergence of Agriculture and Technology. In Agriculture and Aquaculture Applications of Biosensors and Bioelectronics; IGI Global: Hershey, PA, USA, 2024; pp. 266–295. [Google Scholar]
- Patle, K.S.; Saini, R.; Kumar, A.; Palaparthy, V.S. Field evaluation of smart sensor system for plant disease prediction using LSTM network. IEEE Sens. J. 2021, 22, 3715–3725. [Google Scholar] [CrossRef]
- Gui, P.; Dang, W.; Zhu, F.; Zhao, Q. Towards automatic field plant disease recognition. Comput. Electron. Agric. 2021, 191, 106523. [Google Scholar] [CrossRef]
- MacHardy, W.E. Apple Scab: Biology, Epidemiology, and Management; APS Press: St. Paul, MN, USA, 1996. [Google Scholar]
- Ji-Chuan, K.; Crous, P.W.; Mchau, G.R.; Serdani, M.; Shan-Mei, S. Phylogenetic analysis of Alternaria spp. associated with apple core rot and citrus black rot in South Africa. Mycol. Res. 2002, 106, 1151–1162. [Google Scholar]
- Roberts, R. Postharvest biological control of gray mold of apple by Cryptococcus laurentii. Phytopathology 1990, 80, 526–530. [Google Scholar] [CrossRef]
- Turechek, W.W. Apple diseases and their management. In Diseases of Fruits and Vegetables Volume I: Diagnosis and Management; Springer: Dordrecht, The Netherlands, 2004; pp. 1–108. [Google Scholar]
- Strickland, D.A.; Hodge, K.T.; Cox, K.D. An examination of apple powdery mildew and the biology of Podosphaera leucotricha from past to present. Plant Health Prog. 2021, 22, 421–432. [Google Scholar] [CrossRef]
- Kim, Y.S.; Balaraju, K.; Jeon, Y. Biological control of apple anthracnose by Paenibacillus polymyxa APEC128, an antagonistic rhizobacterium. Plant Pathol. J. 2016, 32, 251. [Google Scholar] [CrossRef]
- Tang, W.; Ding, Z.; Zhou, Z.; Wang, Y.; Guo, L. Phylogenetic and pathogenic analyses show that the causal agent of apple ring rot in China is Botryosphaeria dothidea. Plant Dis. 2012, 96, 486–496. [Google Scholar] [CrossRef]
- He, Y.; Zhang, N.; Ge, X.; Li, S.; Yang, L.; Kong, M.; Guo, Y.; Lv, C. Passion Fruit Disease Detection Using Sparse Parallel Attention Mechanism and Optical Sensing. Agriculture 2025, 15, 733. [Google Scholar] [CrossRef]
- Krishnamurthi, R.; Kumar, A.; Gopinathan, D.; Nayyar, A.; Qureshi, B. An overview of IoT sensor data processing, fusion, and analysis techniques. Sensors 2020, 20, 6076. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Aryal, S.; Bouadjenek, M.R. A Comprehensive Review of Handling Missing Data: Exploring Special Missing Mechanisms. arXiv 2024, arXiv:2404.04905. [Google Scholar]
- Li, X.; Li, H.; Lu, H.; Jensen, C.S.; Pandey, V.; Markl, V. Missing Value Imputation for Multi-attribute Sensor Data Streams via Message Propagation (Extended Version). arXiv 2023, arXiv:2311.07344. [Google Scholar]
- Wei, X.; Zhang, T.; Li, Y.; Zhang, Y.; Wu, F. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10941–10950. [Google Scholar]
- Lan, X.; Liu, L.; Wang, X. Dal-yolo: A multi-target detection model for UAV-based road maintenance integrating feature pyramid and attention mechanisms. J. Real-Time Image Process. 2025, 22, 105. [Google Scholar] [CrossRef]
- Zhang, G.; Lu, Y.; Jiang, X.; Jin, S.; Li, S.; Xu, M. LGGFormer: A dual-branch local-guided global self-attention network for surface defect segmentation. Adv. Eng. Inform. 2025, 64, 103099. [Google Scholar] [CrossRef]
- Loshchilov, I. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Yang, L.; Zhang, H.; Zuo, Z.; Peng, J.; Yu, X.; Long, H.; Liao, Y. AFU-Net: A novel U-Net network for rice leaf disease segmentation. Appl. Eng. Agric. 2023, 39, 519–528. [Google Scholar] [CrossRef]
- Rehman, Z.U.; Khan, M.A.; Ahmed, F.; Damaševičius, R.; Naqvi, S.R.; Nisar, W.; Javed, K. Recognizing apple leaf diseases using a novel parallel real-time processing framework based on MASK RCNN and transfer learning: An application for smart agriculture. IET Image Process. 2021, 15, 2157–2168. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Nasir, I.M.; Bibi, A.; Shah, J.H.; Khan, M.A.; Sharif, M.; Iqbal, K.; Nam, Y.; Kadry, S. Deep learning-based classification of fruit diseases: An application for precision agriculture. Comput. Mater. Contin 2021, 66, 1949–1962. [Google Scholar]
- Wang, P.; Wang, S.; Lin, J.; Bai, S.; Zhou, X.; Zhou, J.; Wang, X.; Zhou, C. One-peace: Exploring one general representation model toward unlimited modalities. arXiv 2023, arXiv:2305.11172. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14408–14419. [Google Scholar]
Disease | Data |
---|---|
Apple Scab | 1791 |
Black Rot | 1802 |
Gray Mold | 1504 |
Apple Rot | 2179 |
Apple Powdery Mildew | 2253 |
Anthracnose | 2008 |
Apple Ring Rot | 1937 |
Sensor Type | Mean | Max | Min | Std. Dev. |
---|---|---|---|---|
Temperature (°C) | 21.3 | 36.5 | 7.8 | 6.2 |
Humidity (%) | 67.4 | 100.0 | 29.1 | 15.8 |
Atmospheric Pressure (hPa) | 1006.7 | 1023.4 | 988.2 | 7.6 |
Soil Temperature (°C) | 19.2 | 31.7 | 10.6 | 4.3 |
Harmful Gases (ppm) | 132.6 | 291.0 | 62.5 | 48.9 |
Model | Precision | Recall | Accuracy | F1-Score | Model Size (MB) | FLOPs (G) |
---|---|---|---|---|---|---|
Tiny-Segformer [29] | 0.83 | 0.80 | 0.81 | 0.81 | 152 | 13.8 |
AFU-Net [52] | 0.85 | 0.82 | 0.83 | 0.84 | 224 | 24.5 |
Mask R-CNN [53] | 0.87 | 0.84 | 0.85 | 0.86 | 447 | 37.9 |
U-Net+ [54] | 0.89 | 0.86 | 0.87 | 0.88 | 852 | 32.2 |
CNN-based [55] | 0.90 | 0.88 | 0.89 | 0.89 | 528 | 12.4 |
ONE-PEACE [56] | 0.92 | 0.91 | 0.92 | 0.91 | 620 | 28.6 |
BEiT [57] | 0.91 | 0.89 | 0.90 | 0.90 | 400 | 45.2 |
InternImage [58] | 0.94 | 0.92 | 0.93 | 0.93 | 930 | 39.5 |
Proposed Method—without Sensor Data | 0.93 | 0.92 | 0.92 | 0.93 | 196 | 14.0 |
Proposed Method | 0.97 | 0.94 | 0.96 | 0.95 | 203 | 14.3 |
Model | Apple Scab | Black Rot | Gray Mold | Apple Rot | Apple Powdery Mildew | Anthracnose | Apple Ring Rot |
---|---|---|---|---|---|---|---|
Tiny-Segformer | 0.81 | 0.82 | 0.83 | 0.82 | 0.84 | 0.84 | 0.86 |
AFU-Net | 0.85 | 0.87 | 0.86 | 0.84 | 0.85 | 0.86 | 0.87 |
Mask R-CNN | 0.84 | 0.82 | 0.84 | 0.86 | 0.85 | 0.83 | 0.86 |
U-Net+ | 0.88 | 0.87 | 0.89 | 0.87 | 0.85 | 0.86 | 0.89 |
CNN-based | 0.89 | 0.88 | 0.87 | 0.90 | 0.91 | 0.88 | 0.89 |
ONE-PEACE | 0.91 | 0.89 | 0.92 | 0.90 | 0.91 | 0.89 | 0.90 |
BEiT | 0.90 | 0.88 | 0.89 | 0.88 | 0.90 | 0.87 | 0.88 |
InternImage | 0.92 | 0.91 | 0.93 | 0.91 | 0.92 | 0.90 | 0.92 |
Proposed Method | 0.93 | 0.92 | 0.93 | 0.91 | 0.92 | 0.90 | 0.91 |
Model | (Frames Per Second (FPS)) |
---|---|
Tiny-Segformer [29] | 41 |
AFU-Net [52] | 20 |
Mask R-CNN [53] | 19 |
U-Net+ [54] | 21 |
CNN-based [55] | 17 |
Proposed Method—without Sensor Data | 46 |
Test Field | Precision | Recall | Accuracy | F1-Score |
---|---|---|---|---|
Wuyuan (trained on Qixia + Luochuan) | 0.96 | 0.93 | 0.95 | 0.94 |
Qixia (trained on Wuyuan + Luochuan) | 0.95 | 0.94 | 0.96 | 0.94 |
Luochuan (trained on Wuyuan + Qixia) | 0.96 | 0.92 | 0.95 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, C.; Ge, X.; Chang, Y.; Wang, M.; Shi, Z.; Ji, M.; Wu, T.; Lv, C. A Multimodal Parallel Transformer Framework for Apple Disease Detection and Severity Classification with Lightweight Optimization. Agronomy 2025, 15, 1246. https://doi.org/10.3390/agronomy15051246
Zhou C, Ge X, Chang Y, Wang M, Shi Z, Ji M, Wu T, Lv C. A Multimodal Parallel Transformer Framework for Apple Disease Detection and Severity Classification with Lightweight Optimization. Agronomy. 2025; 15(5):1246. https://doi.org/10.3390/agronomy15051246
Chicago/Turabian StyleZhou, Chuhuang, Xinjin Ge, Yihe Chang, Mingfei Wang, Zhongtian Shi, Mengxue Ji, Tianxing Wu, and Chunli Lv. 2025. "A Multimodal Parallel Transformer Framework for Apple Disease Detection and Severity Classification with Lightweight Optimization" Agronomy 15, no. 5: 1246. https://doi.org/10.3390/agronomy15051246
APA StyleZhou, C., Ge, X., Chang, Y., Wang, M., Shi, Z., Ji, M., Wu, T., & Lv, C. (2025). A Multimodal Parallel Transformer Framework for Apple Disease Detection and Severity Classification with Lightweight Optimization. Agronomy, 15(5), 1246. https://doi.org/10.3390/agronomy15051246