A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images
Abstract
1. Introduction
- We propose a unified self-supervised framework integrating BYOL, MIM, and contrastive learning.
- We design a hybrid loss function that captures global, local, and instance-level information.
- We implement an efficient GPU-based augmentation pipeline for robust representation learning.
- We achieve state-of-the-art performance on challenging PlantDoc and Plant Village datasets without using labels during pretraining.
- We demonstrate strong generalization via transfer learning and provide interpretability through Grad-CAM and t-SNE.
2. Methodology
2.1. Overview of the Proposed Framework
- BYOL, where an online encoder learns to predict the projection of a momentum-updated target encoder;
- MIM, where masked inputs are reconstructed using a combination of pixel-wise accuracy and perceptual similarity;
- Contrastive learning, which brings positive pairs closer and pushes negative pairs apart in the embedding space.
2.2. Model Architecture
2.2.1. Backbone Network and BYOL Framework
2.2.2. Masked Image Modeling (MIM)
2.2.3. Contrastive Learning
2.3. GPU-Based Augmentation Pipeline
- Random resized cropping (scale: 60–100%)
- Random horizontal flipping (50% probability)
- Random rotation (±20 degrees)
- Random solarization (threshold = 0.5, 20% probability)
- Random grayscale conversion (20% probability)
- Color jittering (brightness, contrast, saturation, hue)
- Random Gaussian blur ( 30% probability)
2.4. Hybrid Loss Function
- promotes alignment between different views of the same image;
- enhances spatial awareness by reconstructing masked regions;
- encourages separation between different instances.
2.5. Optimization Strategy
2.6. Transfer Learning Evaluation on PlantVillage Dataset
2.7. Implementation Details
3. Results and Discussion
3.1. Experimental Dataset
- PlantDoc: This is a challenging real-world dataset consisting of 2598 images spanning 13 plant species and 27 classes, including both diseased and healthy leaf samples. The images were captured under natural field conditions and exhibit considerable variability in lighting, background clutter, leaf orientation, and symptom presentation. The dataset includes crops such as apples, grapes, cotton, and maize, providing a comprehensive benchmark for assessing model robustness in practical agricultural scenarios. Example images from the PlantDoc dataset are shown in Figure 2.
- PlantVillage: In contrast, the PlantVillage dataset comprises over 54,000 images representing 38 classes, including a wide variety of healthy and diseased leaf types. The images are captured under controlled laboratory conditions with uniform backgrounds and consistent lighting, making this dataset particularly suitable for evaluating model generalization in clean, noise-free environments. Example images from the PlantVillage dataset are shown in Figure 3.
3.2. Evaluation Metrics
- Accuracy: Overall proportion of correct predictions across all classes.
- Precision: Measures the exactness of the model by computing the proportion of correctly predicted positive instances out of all predicted positives.
- Recall: Also known as sensitivity, it evaluates the model’s ability to identify all actual positive instances.
- F1-score: The harmonic mean of precision and recall, providing a balanced view of performance, especially important in imbalanced datasets.
- Confusion Matrix: Visualizes the relationship between predicted and actual class labels, revealing common misclassification patterns and class-wise accuracy.
- Grad-CAM (Gradient-weighted Class Activation Mapping): Provides visual interpretability by highlighting image regions that are most influential in the model’s decision-making process.
- t-SNE (t-distributed Stochastic Neighbor Embedding): A dimensionality reduction technique used to visualize high-dimensional feature embeddings in a two-dimensional space, enabling inspection of inter-class separability and clustering behavior.
3.3. Performance on PlantDoc
3.3.1. Per-Class Performance
3.3.2. Confusion Matrix Analysis
3.3.3. Model Interpretability via Grad-CAM
3.4. Comparison with Existing Models
Model | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|
Mobilenet [22] | 55.24 | 52.57 | 53.82 |
ResNet | 67.36 | 65.34 | 66.28 |
GoogLeNet [20] | 74.31 | 69.19 | 71.61 |
DenseNet [21] | 69.26 | 66.17 | 67.61 |
ShuffleNet V2 [23] | 72.28 | 71.24 | 71.70 |
MobileViT [24] | 72.55 | 68.22 | 70.32 |
Vision Transformer | 54.35 | 56.77 | 55.53 |
Swin Transformer | 75.85 | 69.91 | 72.76 |
T-CNN (ResNet-101) [25] | 74.44 | - | - |
ICVT [26] | 77.23 | - | - |
Efficient Swin Transformer [19] | 80.84 | 76.72 | 78.16 |
Ours (BYOL + MIM + Contrastive) | 80.00 | 78.24 | 77.48 |
3.5. Ablation Study
3.6. Transferability to PlantVillage
3.6.1. Transfer Performance on PlantVillage
3.6.2. Training Dynamics
3.6.3. Feature Space Visualization with t-SNE
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, M.; Kim, H.; Yang, J.; Fuentes, A.; Meng, Y. Embracing limited and imperfect training datasets: Opportunities and challenges in plant disease recognition using deep learning. Front. Plant Sci. 2023, 14, 1225409. [Google Scholar] [CrossRef] [PubMed]
- Pujari, J.; Yakkundimath, R.; Byadgi, A. Image Processing Based Detection of Fungal Diseases in Plants. Procedia Comput. Sci. 2015, 46, 1802–1808. [Google Scholar] [CrossRef]
- Too, E.; Yujian, L.; Njuki, S.; Yingchun, L. A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 2019, 161, 272–279. [Google Scholar] [CrossRef]
- Che, C.; Xue, N.; Li, Z.; Zhao, Y.; Huang, X. Automatic cassava disease recognition using object segmentation and progressive learning. PeerJ Comput. Sci. 2025, 11, e2721. [Google Scholar] [CrossRef]
- Zhang, J.; Cao, Y.; Xu, D. Uncertainty-aware Masked Modeling in Medical Imaging. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar] [CrossRef]
- Mamun, A.A.; Ahmedt-Aristizabal, D.; Zhang, M.; Ismail Hossen, M.; Hayder, Z.; Awrangjeb, M. Plant Disease Detection Using Self-supervised Learning: A Systematic Review. IEEE Access 2024, 12, 171926–171943. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, R.; Wang, M.; Lai, T.; Zhang, M. Self-supervised Transformer-Based Pre-training Method with General Plant Infection Dataset. In Lecture Notes in Computer Science, Proceedings of the Pattern Recognition and Computer Vision (PRCV), Urumqi, China, 18–20 October 2024; Springer Nature: Singapore, 2025; Volume 15032. [Google Scholar] [CrossRef]
- Gustineli, M.; Miyaguchi, A.; Stalter, I. Multi-Label Plant Species Classification with Self-Supervised Vision Transformers. arXiv 2024, arXiv:2407.06298. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Virtual Event, 13–18 July 2020; pp. 1597–1607. [Google Scholar] [CrossRef]
- Grill, J.-B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent: A new approach to self-supervised learning. In Proceedings of the Neural Information Processing Systems 33, Vancouver, BC, Canada, 6–12 December 2020; pp. 21271–21284. Available online: https://proceedings.neurips.cc/paper_files/paper/2020/file/f3ada80d5c4ee70142b17b8192b2958e-Paper.pdf (accessed on 1 March 2025).
- Singh, D.; Jain, N.; Kayal, P.; Sinha, A. PlantDoc: A Dataset for Visual Plant Disease Detection. In Proceedings of the 7th ACM IKDD Conference on Data Science (CoDS) and the 25th Conference on Management of Data (COMAD), Hyderabad, India, 5–7 January 2020. [Google Scholar] [CrossRef]
- Hughes, D.P.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15979–15988. [Google Scholar] [CrossRef]
- Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar] [CrossRef]
- Riba, E.; Mishkin, D.; Ponsa, D.; Rublee, E.; Bradski, G. Kornia: An Open Source Differentiable Computer Vision Library for PyTorch. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 3663–3672. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, A. Plant Disease Detection Algorithm Based on Efficient Swin Transformer. Comput. Mater. Contin. 2025, 82, 3045–3068. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11218, pp. 122–138. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar] [CrossRef]
- Wang, D.; Wang, J.; Li, W.; Guan, P. T-CNN: Trilinear Convolutional Neural Networks Model for Visual Detection of Plant Diseases. Comput. Electron. Agric. 2021, 190, 106468. [Google Scholar] [CrossRef]
- Yu, S.; Xie, L.; Huang, Q. Inception Convolutional Vision Transformers for Plant Disease Identification. Internet Things 2023, 21, 100650. [Google Scholar] [CrossRef]
- Gokulnath, B.; Usha, D.G. Identifying and classifying plant disease using resilient LF-CNN. Ecol. Inform. 2021, 63, 101283. [Google Scholar] [CrossRef]
- Vo, H.-T.; Quach, L.-D.; Hoang, T. Ensemble of Deep Learning Models for Multi-plant Disease Classification in Smart Farming. Int. J. Adv. Comput. Sci. Appl. 2023, 14. [Google Scholar] [CrossRef]
- Kaya, Y.; Gürsoy, E. A novel multi-head CNN design to identify plant diseases using the fusion of RGB images. Ecol. Inform. 2023, 75, 101998. [Google Scholar] [CrossRef]
- Mohanty, S.; Hughes, D.; Salath’e, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
- Atila, Ü.; Uçar, M.; Akyol, K.; Uçar, E. Plant leaf disease classification using EfficientNet deep learning model. Ecol. Inform. 2021, 61, 101182. [Google Scholar] [CrossRef]
- Ali, A.H.; Youssef, A.; Abdelal, M.; Raja, M.A. An ensemble of deep learning architectures for accurate plant disease classification. Ecol. Inform. 2024, 81, 102618. [Google Scholar] [CrossRef]
- Ouamane, A.; Chouchane, A.; Himeur, Y.; Miniaoui, S.; Zaguia, A. Optimized vision transformers for superior plant disease detection. IEEE Access 2025, 13, 39165–39181. [Google Scholar] [CrossRef]
Category | Setting/Description |
Backbone Network | ResNet-101 (pretrained on ImageNet) |
Input Resolution | 384 × 384 |
BYOL Projection Head | Linear (2048 → 512) → ReLU → Linear (512 → 256) |
BYOL Prediction Head | Linear (256 → 512) → ReLU → Linear (512 → 256) |
MLP Classifier (used during fine-tuning) | Linear (2048 → 512) → ReLU → Linear (512 → 27) |
Augmentation (GPU) | Kornia-based stochastic transforms, applied on GPU |
Random Crop (scale) | 0.6–1.0 |
Horizontal Flip | p = 0.5 |
Rotation | ±20° |
Solarization | Threshold = 0.5, p = 0.2 |
Grayscale Conversion | p = 0.2 |
Color Jitter | Brightness, contrast, saturation, and hue |
Gaussian Blur | σ ∈ [0.1, 2.0], p = 0.3 |
Normalization (mean/std) | Mean = [0.485, 0.456, 0.406]; Std = [0.229, 0.224, 0.225] |
Training Configuration | |
Epochs | 90 |
Batch Size | 16 |
Optimizer | AdamW |
Learning Rate | Warmup from 1 × 10−6 to 1 × 10−4 over first 10 epochs |
Weight Decay | 1 × 10−5 |
Gradient Accumulation | Every 2 batches |
Mixed Precision | Enabled via torch.cuda.amp |
Checkpointing | Every 5 epochs |
Loss Weights | |
BYOL Loss | 1.0—Cosine similarity between online prediction and target projection. |
MIM Loss | 0.6—Combine pixel-level differences and perceptual similarity, balancing mean squared error (MSE) with structural similarity (SSIM). |
Contrastive Loss | 0.6—InfoNCE loss with a temperature of τ = 0.1 for instance discrimination. |
Final Training Loss | L_Total = 1.0∙L_BYOL + 0.6∙L_MIM + 0.6∙L_Contrastive |
# | Class | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
0 | Apple Scab Leaf | 72.73 | 80.00 | 76.19 |
1 | Apple leaf | 57.14 | 88.89 | 69.57 |
2 | Apple rust leaf | 88.89 | 72.73 | 80.00 |
3 | Bell_pepper leaf | 85.71 | 75.00 | 80.00 |
4 | Bell_pepper leaf spot | 83.33 | 100.00 | 90.91 |
5 | Blueberry leaf | 90.00 | 81.82 | 85.71 |
6 | Cherry leaf | 83.33 | 50.00 | 62.50 |
7 | Corn gray leaf spot | 18.18 | 50.00 | 26.67 |
8 | Corn leaf blight | 71.43 | 41.67 | 52.63 |
9 | Corn rust leaf | 81.82 | 90.00 | 85.71 |
10 | Peach leaf | 100.00 | 77.78 | 87.50 |
11 | Potato leaf early blight | 62.50 | 62.50 | 62.50 |
12 | Potato leaf late blight | 62.50 | 62.50 | 62.50 |
13 | Raspberry leaf | 87.50 | 100.00 | 93.33 |
14 | Soyabean leaf | 100.00 | 87.50 | 93.33 |
15 | Squash powdery mildew leaf | 100.00 | 100.00 | 100.00 |
16 | Strawberry leaf | 100.00 | 100.00 | 100.00 |
17 | Tomato early blight leaf | 72.73 | 88.89 | 80.00 |
18 | Tomato septoria leaf spot | 78.57 | 100.00 | 88.00 |
19 | Tomato leaf | 57.14 | 50.00 | 53.33 |
20 | Tomato leaf bacterial spot | 83.33 | 50.00 | 62.50 |
21 | Tomato leaf late blight | 77.78 | 70.00 | 73.68 |
22 | Tomato leaf mosaic virus | 100.00 | 50.00 | 66.67 |
23 | Tomato leaf yellow virus | 100.00 | 100.00 | 100.00 |
24 | Tomato mold leaf | 45.45 | 83.33 | 58.82 |
25 | Grape leaf | 100.00 | 100.00 | 100.00 |
26 | Grape leaf black rot | 100.00 | 100.00 | 100.00 |
Macro Average | 80.00 | 78.24 | 77.48 |
Method | Accuracy (%) | Precision (Macro) | Recall (Macro) | F1-Score (Macro) |
---|---|---|---|---|
SimCLR | 73.31 | 74.12 | 73.25 | 72.53 |
DINOv2 | 73.31 | 74.55 | 73.24 | 72.78 |
BYOL | 74.58 | 76.70 | 74.95 | 74.52 |
BYOL + Contrastive | 74.20 | 75.90 | 74.10 | 72.90 |
BYOL + MIM | 74.58 | 76.28 | 74.40 | 73.17 |
BYOL + MIM + Contrastive | 77.82 | 80.00 | 78.24 | 77.48 |
Reference | Dataset | Accuracy (%) | Contribution |
---|---|---|---|
[3] | PlantVillage | 99.75 | Performance evaluation of DenseNet compared to other deep learning models |
[27] | PlantVillage | 98.93 | Combined multiple loss functions within a CNN-based framework |
[28] | PlantVillage | 99.70 | Developed an ensemble of two distinct deep learning models |
[29] | PlantVillage | 98.17 | Integrated DenseNet with both RGB and segmented leaf image inputs |
[30] | PlantVillage | 99.31 | Applied a basic deep CNN model for plant disease classification |
[31] | PlantVillage | 99.91 | Improved accuracy through extensive augmentation and deeper architecture |
[32] | PlantVillage | 99.89 | Utilized image preprocessing with a 10-model ensemble to maximize performance |
[33] | PlantVillage | 99.77 | Optimized Vision Transformers with multiscale attention and targeted preprocessing |
Proposed (Ours) | PlantVillage | 99.85 | Self-supervised ResNet101 trained with BYOL, MIM, and contrastive learning |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huan, X.; Chen, B.; Zhou, H. A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images. Electronics 2025, 14, 3410. https://doi.org/10.3390/electronics14173410
Huan X, Chen B, Zhou H. A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images. Electronics. 2025; 14(17):3410. https://doi.org/10.3390/electronics14173410
Chicago/Turabian StyleHuan, Xiaoli, Bernard Chen, and Hong Zhou. 2025. "A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images" Electronics 14, no. 17: 3410. https://doi.org/10.3390/electronics14173410
APA StyleHuan, X., Chen, B., & Zhou, H. (2025). A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images. Electronics, 14(17), 3410. https://doi.org/10.3390/electronics14173410