Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review
Abstract
1. Introduction
1.1. Background and Significance
1.2. Definition of Core Concepts
1.2.1. Limited Data in Tomato Leaf Disease Detection
1.2.2. Multimodal Data in Tomato Leaf Disease Detection
1.2.3. Relationship Between Limited Data and Multimodal Data
1.2.4. Mathematical Problem Formulation
1.3. Review Methodology
2. Challenges in Tomato Leaf Disease Detection: Limited and Multimodal Data
2.1. Challenges from Limited Data
2.1.1. Small Sample Size
2.1.2. Class Imbalance
2.1.3. Low-Quality and Noisy Samples
2.1.4. Annotation Noise and Inter-Annotator Variability
- Class noise, where the primary disease label is incorrect (e.g., early blight labeled as late blight or TYLCV confusion with other yellowing disorders).
- Boundary noise, where lesion regions in detection or segmentation tasks are imprecisely delineated, often due to occlusions or low resolution.
- Severity noise, where ordinal severity scores (e.g., mild, moderate, severe) exhibit high variability between annotators or annotation sessions.
2.2. Challenges from Multimodal Data
2.2.1. Heterogeneity of Multimodal Data
2.2.2. Difficulty of Cross-Modal Alignment
2.2.3. Model Complexity and Resource Constraints
2.3. Interaction Between Limited and Multimodal Data Challenges
3. Technical Solutions for Limited Data in Tomato Leaf Disease Detection
3.1. Transfer Learning
3.1.1. Principle and Implementation
3.1.2. Application Cases and Performance
3.1.3. Advantages and Limitations
3.2. Self-Distillation and Ensemble Learning
3.2.1. Self-Distillation for Data Efficiency
3.2.2. Ensembles Under Class Imbalance
3.2.3. Benefits and Trade-Offs
3.3. Data Augmentation
3.3.1. Classical and Generative Augmentation Strategies
3.3.2. Effectiveness and Limitations Under Limited Data
3.4. Self-Supervised Representation Learning
3.4.1. Core Idea and Objectives
3.4.2. Representative Paradigms
- Contrastive SSL (SimCLR/MoCo-style): pull together two augmented views of the same leaf and push apart different leaves; improves invariance to illumination/background.
- Masked image modeling (ViT-style): randomly masks patches and reconstructs them; captures global leaf structure and context under occlusions.
- Multimodal SSL (image–text alignment): contrastive objectives on paired image–description data, enabling CLIP-like retrieval and zero-shot diagnosis.
3.4.3. Benefits and Limitations Under Limited Data
3.5. Few- and Zero-Shot Learning for Rare Diseases
3.5.1. Problem Setting
3.5.2. Metric-Based and Meta-Learning Approaches
3.5.3. Zero-Shot Learning with Symptom Semantics
3.6. Domain Generalization and Domain Adaptation
3.6.1. Domain Shift in Tomato Leaf Datasets
3.6.2. Domain Adaptation (DA)
- Feature alignment: learning domain-invariant features by minimizing discrepancies between feature distributions of source and target leaves.
- Style transfer and image translation: using GAN-based models to translate source images into target-like appearances (field style), thereby augmenting training data for robust disease recognition under target conditions.
- Self-training: iteratively assigning pseudo-labels to confident target samples and retraining the model to better fit the target domain.
3.6.3. Domain Generalization (DG)
- Data-level diversification: constructing composite training sets from multiple datasets (PlantVillage, PlantDoc, Tomato-Village, etc.) and applying strong augmentations to simulate wider domain variability.
- Representation learning with invariance: enforcing invariance across source domains via regularization or contrastive objectives, so that disease-discriminative features are insensitive to domain-specific factors such as background and imaging device.
- Meta-learning for domains: treating each source domain as a meta-task and training models that can quickly adapt to novel domains using a small calibration set, which is particularly suitable for deployment on new farms with a limited number of annotated images.
3.7. Active Learning and Human-in-the-Loop Annotation
- Uncertainty-based sampling, which prioritizes samples with high predictive entropy or low margin between the top predicted classes, under the intuition that the model is currently unsure about these images.
- Diversity-based sampling, which selects images that are not only uncertain but also diverse in feature space, reducing redundancy and covering a broader range of lesion appearances and environmental conditions.
- Hybrid and task-aware strategies, which incorporate class imbalance, expected error reduction, or domain coverage (e.g., preferring images from underrepresented farms or cultivars).
3.8. Semi-Supervised Learning and Pseudo-Labeling
3.9. Practical Guidelines for Combining Limited-Data Strategies in Tomato Applications
4. Technical Solutions for Multimodal Data Fusion in Tomato Leaf Disease Detection
4.1. Taxonomy of Multimodal Fusion Strategies
Critical Remarks
4.2. Image-Text Fusion for Retrieval and Recognition
4.3. Viral and Image Data Fusion for TYLCV Detection
4.4. Integration with IoT Sensors and Remote Sensing
4.5. Emerging Multimodal Paradigms: CLIP-Style and Cross-Attention Models
4.6. Design Patterns and Ablation Studies for Multimodal Tomato Systems
5. Case Studies: Typical Models and Benchmark Datasets
5.1. Benchmark Datasets for Tomato Leaf Disease Detection
- PlantVillage: a large-scale laboratory-style dataset with uniform backgrounds and controlled lighting, widely adopted as a starting point for tomato disease classification. It contains multiple tomato diseases (e.g., early blight, late blight, leaf mold, Septoria leaf spot, Tomato yellow leaf curl virus), with substantial class imbalance between common and rare diseases.
- PlantDoc: a field-style dataset with natural backgrounds, occlusions, and illumination variations. Compared to PlantVillage, PlantDoc better reflects real-world complexity but provides fewer samples per class.
- Dataset of Tomato Leaves: a mid-sized dataset with natural backgrounds, focusing on a subset of common tomato leaf diseases and healthy leaves, often used to evaluate the robustness of models fine-tuned from PlantVillage.
- Tomato-Village and related field datasets: newer datasets designed to stress-test generalization to rare diseases (e.g., tomato leaf miner, spotted wilt) and diverse cultivation systems. They typically contain fewer images for rare classes, exacerbating limited-data and class-imbalance issues.
- Multimodal datasets (e.g., TLDITRD): paired image–text datasets where each tomato leaf image is accompanied by structured symptom descriptions or free-text annotations. These datasets enable the study of image–text retrieval and multimodal fusion.
5.2. Model Performance on Limited Data
5.2.1. EMA-DeiT
5.2.2. KD-ShuffleNetV2
5.2.3. YOLOv11m with Hyperparameter Optimization
5.3. Multimodal Model Performance
5.4. Comparative Summary of Model Families
5.5. Cross-Dataset Evaluation and Ablation Practices
Actionable Ablations
- Unimodal baselines: image-only, text-only, molecular-only, and environmental-only performance.
- Incremental fusion: image+text, image+sensor, image+molecular, and full fusion; report the gain over image-only.
- Modality dropout: randomly mask one modality at inference to test robustness under missing inputs.
6. Current Challenges and Future Opportunities
6.1. Technical Bottlenecks
6.2. Data and Standardization Issues
6.3. Edge-Ready Deployment: Example Frameworks and Hardware Requirements
- Smartphone diagnosis: image-only baseline with optional symptom text; lightweight CNN/ViT + on-device quantization.
- Greenhouse edge box: camera + environment sensors; periodic inference + risk alerts; robust to missing sensors.
- Hybrid cloud–edge: edge performs screening; uncertain cases uploaded for multimodal fusion and expert review.
6.4. Future Opportunities
6.5. Under-Studied Areas and Open Problems
- Semi-supervised multimodal learning: how to reliably exploit large unlabeled image pools together with sparse paired text/molecular/sensor signals without confirmation bias.
- Explainable multimodal diagnosis: explanations that align with agronomic concepts (lesion type, symptom attributes, growth stage) rather than generic saliency maps.
- Unified multimodal benchmarks: standardized datasets and protocols covering multiple regions, devices, and modalities (image/text/sensors/molecular), enabling fair cross-paper comparison.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| TYLCV | Tomato yellow leaf curl virus |
| DL | Deep learning |
| CNN | Convolutional neural network |
| ViT | Vision Transformer |
| FSL | Few-shot learning |
| SSL | Self-supervised learning |
| GAN | Generative adversarial network |
| mAP | mean Average Precision |
Appendix A. Proof Sketch of the Generalization Bound
References
- Cao, X.; Huang, M.N.; Wang, S.M.; Li, T.; Huang, Y. Tomato yellow leaf curl virus: Characteristics, influence, and regulation mechanism. Plant Physiol. Biochem. 2024, 213, 108812. [Google Scholar] [CrossRef]
- Nwakoby, I.; Iheukwumere, I.; Iheukwumere, C.; Nwakoby, N.; Idigo, M.; Ike, V. Food safety and law: The role of microbiology in ensuring safe food products. IPS J. Nutr. Food Sci. 2025, 4, 601–607. [Google Scholar] [CrossRef]
- Ahmed, M.; Babayola, M.; Bake, I. Role of Horticultural Crops in Food and Nutritional Security: A Review. J. Nutr. Food Process. 2024, 7, 1–6. [Google Scholar] [CrossRef]
- Lata, S.; Hussain, Z.; Yadav, R.; Jat, G.S.; Kumar, P.; Tomar, B. Insights into the genetic improvement of tomato. In Genetic Engineering of Crop Plants for Food and Health Security: Volume 2; Springer: Berlin/Heidelberg, Germany, 2024; pp. 165–184. [Google Scholar]
- Arain, S.M.; Sajjad, M.; Faheem, M.; Ullah, G.; Laghari, K.A.; Sial, M.A. Confronting Abiotic Stresses: Molecular Strategies for Improving Tomato Stress Tolerance. In Omics Approaches for Tomato Yield and Quality Trait Improvement; Springer: Berlin/Heidelberg, Germany, 2025; pp. 55–94. [Google Scholar]
- Sun, C.; Li, Y.; Song, Z.D.; Liu, Q.; Si, H.P.; Yang, Y.J.; Cao, Q. Research on tomato disease image recognition method based on DeiT. Eur. J. Agron. 2025, 162, 127400. [Google Scholar] [CrossRef]
- Gašić, K.; Ivanović, M.M.; Ignjatov, M.; Calić, A.; Obradović, A. Isolation and characterization of Xanthomonas euvesicatoria bacteriophages. J. Plant Pathol. 2011, 2, 415–423. [Google Scholar]
- Chaerani, R.; Voorrips, R.E. Tomato early blight (Alternaria solani): The pathogen, genetics, and breeding for resistance. J. Gen. Plant Pathol. 2006, 72, 335–347. [Google Scholar] [CrossRef]
- Legard, D.; Lee, T.; Fry, W. Pathogenic specialization in Phytophthora infestans: Aggressiveness on tomato. Phytopathology 1995, 85, 1356–1361. [Google Scholar] [CrossRef]
- Moriones, E.; Navas-Castillo, J. Tomato yellow leaf curl virus, an emerging virus complex causing epidemics worldwide. Virus Res. 2000, 71, 123–134. [Google Scholar] [CrossRef]
- Watanabe, H.; Horinouchi, H.; Muramoto, Y.; Ishii, H. Occurrence of azoxystrobin-resistant isolates in Passalora fulva, the pathogen of tomato leaf mould disease. Plant Pathol. 2017, 66, 1472–1479. [Google Scholar] [CrossRef]
- Pritchard, F.J.; Porte, W. The relation of temperature and humidity to tomato leaf spot (Septoria lycopersici Speg.). Phytopathology 1924, 14, 156–169. [Google Scholar]
- Guo, Q.; Sun, Y.; Ji, C.; Kong, Z.; Liu, Z.; Li, Y.; Li, Y.; Lai, H. Plant resistance to tomato yellow leaf curl virus is enhanced by Bacillus amyloliquefaciens Ba13 through modulation of RNA interference. Front. Microbiol. 2023, 14, 1251698. [Google Scholar] [CrossRef]
- Sánchez, M.S.; Hernández, E.A.; Quintana-Obregón, E.A.; Arispuro, I.V.; Téllez, M.Á.M. Estimating tomato production losses due to plant viruses, a look at the past and new challenges. Comun. Sci. 2024, 15, 71. [Google Scholar] [CrossRef]
- Akbar, A.; Al Hashash, H.; Al-Ali, E. Tomato yellow leaf curl virus (TYLCV) in Kuwait and global analysis of the population structure and evolutionary pattern of TYLCV. Virol. J. 2024, 21, 308. [Google Scholar] [CrossRef]
- Kumar, M.; Bag, S.; McAvoy, T.; Torrance, T.; Cloud, C.; Simmons, A.M. A shift in begomovirus Coheni populations associated with tomato yellow leaf curl disease infecting tomato cultivars in the southeastern united States. Plant Pathol. 2025, 74, 1277–1289. [Google Scholar] [CrossRef]
- Moldvai, L.; Nyéki, A. Innovative computer vision methods for tomato (Solanum Lycopersicon) detection and cultivation: A review. Discov. Appl. Sci. 2025, 7, 975. [Google Scholar] [CrossRef]
- Deng, S.; Zhu, J.; Hu, Y.; He, M.; Xia, Y. Tomato Leaf Disease Identification Framework FCMNet Based on Multimodal Fusion. Plants 2025, 14, 2329. [Google Scholar] [CrossRef]
- Upadhyay, A.; Patel, A.; Patel, A.; Chandel, N.S.; Chakraborty, S.K.; Bhalekar, D.G. Leveraging AI and ML in Precision Farming for Pest and Disease Management: Benefits, Challenges, and Future Prospects. In Ecologically Mediated Development: Promoting Biodiversity Conservation and Food Security; Springer: Singapore, 2025; pp. 511–528. [Google Scholar]
- Castillo-Girones, S.; Munera, S.; Martínez-Sober, M.; Blasco, J.; Cubero, S.; Gómez-Sanchis, J. Artificial Neural Networks in Agriculture, the core of artificial intelligence: What, When, and Why. Comput. Electron. Agric. 2025, 230, 109938. [Google Scholar] [CrossRef]
- Kumari, S.; Venkatesh, V.; Tan, F.T.C.; Bharathi, S.V.; Ramasubramanian, M.; Shi, Y. Application of machine learning and artificial intelligence on agriculture supply chain: A comprehensive review and future research directions. Ann. Oper. Res. 2025, 348, 1573–1617. [Google Scholar] [CrossRef]
- Ali, Z.; Muhammad, A.; Lee, N.; Waqar, M.; Lee, S.W. Artificial Intelligence for sustainable agriculture: A comprehensive review of AI-driven technologies in crop production. Sustainability 2025, 17, 2281. [Google Scholar] [CrossRef]
- Aijaz, N.; Lan, H.; Raza, T.; Yaqub, M.; Iqbal, R.; Pathan, M.S. Artificial intelligence in agriculture: Advancing crop productivity and sustainability. J. Agric. Food Res. 2025, 20, 101762. [Google Scholar] [CrossRef]
- Khan, R.; Ud Din, N.; Zaman, A.; Huang, B. Automated Tomato Leaf Disease Detection Using Image Processing: An SVM-Based Approach with GLCM and SIFT Features. J. Eng. 2024, 2024, 9918296. [Google Scholar] [CrossRef]
- Shanthi, D.; Vinutha, K.; Ashwini, N.; Vashistha, S. Tomato leaf disease detection using CNN. Procedia Comput. Sci. 2024, 235, 2975–2984. [Google Scholar] [CrossRef]
- Gehlot, M.; Saxena, R.K.; Gandhi, G.C. “Tomato-Village”: A dataset for end-to-end tomato disease detection in a real-world environment. Multimed. Syst. 2023, 29, 3305–3328. [Google Scholar] [CrossRef]
- Wang, X.; Liu, J. An efficient deep learning model for tomato disease detection. Plant Methods 2024, 20, 61. [Google Scholar] [CrossRef]
- Sun, W.; Xu, Z.; Xu, K.; Ru, L.; Yang, R.; Wang, R.; Xing, J. Ultra-lightweight tomatoes disease recognition method based on efficient attention mechanism in complex environment. Front. Plant Sci. 2025, 15, 1491593. [Google Scholar] [CrossRef]
- Ajith, S.; Vijayakumar, S.; Elakkiya, N. Yield prediction, pest and disease diagnosis, soil fertility mapping, precision irrigation scheduling, and food quality assessment using machine learning and deep learning algorithms. Discov. Food 2025, 5, 67. [Google Scholar] [CrossRef]
- Jonak, M.; Mucha, J.; Jezek, S.; Kovac, D.; Cziria, K. SPAGRI-AI: Smart precision agriculture dataset of aerial images at different heights for crop and weed detection using super-resolution. Agric. Syst. 2024, 216, 103876. [Google Scholar] [CrossRef]
- Li, H.; Chen, B.; Chen, J.; Li, S.; He, F.; Hu, Y. ITIMCA: Image-text information and cross-attention for multi-modal cassava leaf disease classification based on a novel multi-modal dataset in natural environments. Crop Prot. 2025, 189, 106981. [Google Scholar] [CrossRef]
- El Sakka, M.; Ivanovici, M.; Chaari, L.; Mothe, J. A review of CNN applications in smart agriculture using multimodal data. Sensors 2025, 25, 472. [Google Scholar] [CrossRef] [PubMed]
- Sapkota, R.; Qureshi, R.; Hadi, M.U.; Hassan, S.Z.; Sadak, F.; Shoman, M.; Sajjad, M.; Dharejo, F.A.; Paudel, A.; Li, J.; et al. Multi-modal LLMs in agriculture: A comprehensive review. IEEE Trans. Autom. Sci. Eng. 2025, 22, 22510–22540. [Google Scholar] [CrossRef]
- Li, Q.; Zhang, Y.; Mai, Z.; Chen, Y.; Lou, S.; Huang, H.; Zhang, J.; Zhang, Z.; Wen, Y.; Li, W.; et al. Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind. arXiv 2025, arXiv:2505.12207. [Google Scholar] [CrossRef]
- Hussain, I.; Farooq, T.; Khan, S.A.; Ali, N.; Waris, M.; Jalal, A.; Nielsen, S.L.; Ali, S. Variability in indigenous Pakistani tomato lines and worldwide reference collection for Tomato Mosaic Virus (ToMV) and Tomato Yellow Leaf Curl Virus (TYLCV) infection. Braz. J. Biol. 2022, 84, e253605. [Google Scholar] [CrossRef]
- Li, F.; Qiao, R.; Yang, X.; Gong, P.; Zhou, X. Occurrence, distribution, and management of tomato yellow leaf curl virus in China. Phytopathol. Res. 2022, 4, 28. [Google Scholar] [CrossRef]
- Ni, S.; Jia, Y.; Zhu, M.F.; Zhang, Y.Z.; Wang, W.D.; Liu, S.X.; Chen, Y.W. An improved ShuffleNetV2 method based on ensemble self-distillation for tomato leaf diseases recognition. Front. Plant Sci. 2025, 15, 1521008. [Google Scholar] [CrossRef] [PubMed]
- Gupta, S.; Tripathi, A.K.; Lewis, N. Pre-trained noise based unsupervised GAN for fruit disease classification in imbalanced datasets. Pattern Anal. Appl. 2025, 28, 39. [Google Scholar] [CrossRef]
- Shoaib, M.; Hussain, T.; Shah, B.; Ullah, I.; Shah, S.M.; Ali, F.; Park, S.H. Deep learning-based segmentation and classification of leaf images for detection of tomato plant disease. Front. Plant Sci. 2022, 13, 1031748. [Google Scholar] [CrossRef]
- Ma, Y.; Tian, Y.; Moniz, N.; Chawla, N.V. Class-imbalanced learning on graphs: A survey. ACM Comput. Surv. 2025, 57, 207. [Google Scholar] [CrossRef]
- Vinothini, A.; Aswiga, R. Transfer learning based deep learning model for classifying tomato plant leaf diseases. Eng. Res. Express 2025, 7, 025250. [Google Scholar] [CrossRef]
- Pazou, M.G.A.; Sobabe, A.A.; Kouhoundji, N.; Dovonou, C. Detection of bacterial spot and yellow leaf curl virus in tomato leaves images using deep learning. In Proceedings of the 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 9–10 December 2021; pp. 1–5. [Google Scholar]
- Alzahrani, M. Automated Tomato Defect Detection Using CNN Feature Fusion for Enhanced Classification. Processes 2025, 13, 115. [Google Scholar] [CrossRef]
- Nishankar, S.; Mithuran, T.; Thuseethan, S.; Sebastian, Y.; Yeo, K.C.; Shanmugam, B. TOM-SSL: Tomato Disease Recognition Using Pseudo-Labelling-Based Semi-Supervised Learning. AgriEngineering 2025, 7, 248. [Google Scholar] [CrossRef]
- Dhiab, Y.B.; Aoueileyine, M.O.E.; Namoun, A.; Bouallegue, R. TomDetLeaf: A Realistic Multi-Source Dataset for Real-Time Tomato Leaf Detection. Int. J. Adv. Comput. Sci. Appl. 2025, 16. [Google Scholar] [CrossRef]
- Tang, X.; Sun, Z.; Yang, L.; Chen, Q.; Liu, Z.; Wang, P.; Zhang, Y. YOLOv11-AIU: A lightweight detection model for the grading detection of early blight disease in tomatoes. Plant Methods 2025, 21, 118. [Google Scholar] [CrossRef]
- Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A dataset for visual plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 249–253. [Google Scholar]
- Xu, J.X.; Zhou, H.L.; Hu, Y.F.; Xue, Y.F.; Zhou, G.X.; Li, L.J.; Dai, W.S.; Li, J.Y. High-Accuracy Tomato Leaf Disease Image-Text Retrieval Method Utilizing LAFANet. Plants 2024, 13, 1176. [Google Scholar] [CrossRef] [PubMed]
- Nakagawa, Y.; Sano, H.; Takata, T. Classification of Tomato Growth Degree Adopting Machine-Learning to Photomorphogenesis Information in the Visible Light Region. In Proceedings of the 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 18–21 February 2025; pp. 82–86. [Google Scholar]
- Zhang, K.; Chai, Q.; Qian, X.; Gao, R.; Liu, X.; Yang, L.; Pang, G.; Wang, Y.; Sun, J. Potential of machine learning in leaf-based multi-source data driven tomato growth monitoring. Smart Agric. Technol. 2025, 10, 100854. [Google Scholar] [CrossRef]
- Huo, Y.; Liu, Y.; He, P.; Hu, L.; Gao, W.; Gu, L. Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer. Agriculture 2025, 15, 120. [Google Scholar] [CrossRef]
- Oni, M.K.; Prama, T.T. A comprehensive dataset of tomato leaf images for disease analysis in Bangladesh. Data Brief 2025, 59, 111327. [Google Scholar] [CrossRef]
- Skoric, D.; Zindovic, J.; Grbin, D.; Pul, P.; Božović, V.; Margaria, P.; Mehle, N.; Pecman, A.; Kogej Zwitter, Z.; Kutnjak, D.; et al. Tomato spotted wilt virus in tomato from Croatia, Montenegro and Slovenia: Genetic diversity and evolution. Front. Microbiol. 2025, 16, 1618327. [Google Scholar] [CrossRef]
- Li, Z.G.; Tang, Y.F.; She, X.M.; Yu, L.; Lan, G.B.; Ding, S.W.; He, Z.F. Characterisation of a Betasatellite Associated with Tomato Yellow Leaf Curl Guangdong Virus and Discovery of an Unusual Modulation of Virus Infection Associated with C4 Protein. Mol. Plant Pathol. 2025, 26, e70051. [Google Scholar] [CrossRef]
- Arnal Barbedo, J.G. Digital image processing techniques for detecting, quantifying and classifying plant diseases. SpringerPlus 2013, 2, 660. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Guan, H.; Wang, L. Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model. Agronomy 2024, 14, 1518. [Google Scholar] [CrossRef]
- Attri, I.; Awasthi, L.K.; Sharma, T.P. Machine learning in agriculture: A review of crop management applications. Multimed. Tools Appl. 2024, 83, 12875–12915. [Google Scholar] [CrossRef]
- Pacal, I.; Kunduracioglu, I.; Alma, M.H.; Deveci, M.; Kadry, S.; Nedoma, J.; Slany, V.; Martinek, R. A systematic review of deep learning techniques for plant diseases. Artif. Intell. Rev. 2024, 57, 304. [Google Scholar] [CrossRef]
- Guo, R.; Li, B.; Zhao, Y.; Tang, C.; Klosterman, S.J.; Wang, Y. Rhizobacterial Bacillus enrichment in soil enhances smoke tree resistance to Verticillium wilt. Plant Cell Environ. 2024, 47, 4086–4100. [Google Scholar] [CrossRef] [PubMed]
- Xiong, S.; Wang, L.; Zhang, Y.; Dong, P.; Wang, B.; Che, Y.; Shi, L.; Si, H. Boosting crop disease recognition via automated image description generation and multimodal fusion. Comput. Electron. Agric. 2025, 239, 111082. [Google Scholar] [CrossRef]
- Ogidi, F.C.; Eramian, M.G.; Stavness, I. Benchmarking self-supervised contrastive learning methods for image-based plant phenotyping. Plant Phenomics 2023, 5, 37. [Google Scholar] [CrossRef]
- Xin, Y.; Liu, L.; Yang, X.R.; Yang, L.Y.; Guang, S.B.; Zheng, Y.M.; Zhao, Q.B. Adaptive shifts in plant traits associated with nitrogen removal driven by phytoremediation strategies in subtropical river restoration. Water Res. 2024, 249, 121008. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.; Feng, Q.; Guo, F.; Zhou, W. Estimation of Potato Growth Parameters Under Limited Field Data Availability by Integrating Few-Shot Learning and Multi-Task Learning. Agriculture 2025, 15, 1638. [Google Scholar] [CrossRef]
- Adhikari, P.; Oh, Y.; Panthee, D.R. Current status of early blight resistance in tomato: An update. Int. J. Mol. Sci. 2017, 18, 2019. [Google Scholar] [CrossRef]
- Nowicki, M.; Kozik, E.U.; Foolad, M.R. Late blight of tomato. Transl. Genom. Crop Breed. Biot. Stress 2013, 1, 241–265. [Google Scholar]
- Lee, Y.S.; Patil, M.P.; Kim, J.G.; Seo, Y.B.; Ahn, D.H.; Kim, G.D. Hyperparameter Optimization for Tomato Leaf Disease Recognition Based on YOLOv11m. Plants 2025, 14, 653. [Google Scholar] [CrossRef]
- Abd-Alla, M.H.; Bashandy, S.R.; Schnell, S.; Ratering, S. Isolation and characterization of Serratia rubidaea from dark brown spots of tomato fruits. Phytoparasitica 2011, 39, 175–183. [Google Scholar] [CrossRef]
- Sharma, S.; Bhattarai, K. Progress in developing bacterial spot resistance in tomato. Agronomy 2019, 9, 26. [Google Scholar] [CrossRef]
- Dovas, C.; Katis, N.; Avgelis, A. Multiplex detection of criniviruses associated with epidemics of a yellowing disease of tomato in Greece. Plant Dis. 2002, 86, 1345–1349. [Google Scholar] [CrossRef]
- Liu, X.; Lin, Y.; Wu, C.; Yang, Y.; Su, D.; Xian, Z.; Zhu, Y.; Yu, C.; Hu, G.; Deng, W.; et al. The SlARF4-SlHB8 regulatory module mediates leaf rolling in tomato. Plant Sci. 2023, 335, 111790. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
- Koroteev, M.V. BERT: A review of applications in natural language processing and understanding. arXiv 2021, arXiv:2103.11943. [Google Scholar] [CrossRef]
- Zhang, L.; Bao, C.; Ma, K. Self-distillation: Towards efficient and compact neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4388–4403. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Zhang, R.; Shen, C.; Kong, T.; Li, L. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3024–3033. [Google Scholar]
- Zhang, R.; Liu, C.; Su, Y.; Li, R.; Huang, X.; Li, X.; Yu, P.S. A Comprehensive Survey on Multimodal RAG: All Combinations of Modalities as Input and Output. TechRxiv 2025. [Google Scholar] [CrossRef] [PubMed]
- Zhao, K.; Wu, X.; Xiao, Y.; Jiang, S.; Yu, P.; Wang, Y.; Wang, Q. PlanText: Gradually Masked Guidance to Align Image Phenotypes with Trait Descriptions for Plant Disease Texts. Plant Phenomics 2024, 6, 272. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Gholizade, M.; Soltanizadeh, H.; Rahmanimanesh, M.; Sana, S.S. A review of recent advances and strategies in transfer learning. Int. J. Syst. Assur. Eng. Manag. 2025, 16, 1123–1162. [Google Scholar] [CrossRef]
- Hossen, M.I.; Awrangjeb, M.; Pan, S.; Mamun, A.A. Transfer learning in agriculture: A review. Artif. Intell. Rev. 2025, 58, 97. [Google Scholar] [CrossRef]
- Targ, S.; Almeida, D.; Lyman, K. Resnet in resnet: Generalizing residual architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar] [CrossRef]
- Sun, W.; Zhang, X.; He, X. Lightweight image classifier using dilated and depthwise separable convolutions. J. Cloud Comput. 2020, 9, 55. [Google Scholar] [CrossRef]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
- Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Chen, C.P. A survey on imbalanced learning: Latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 137. [Google Scholar] [CrossRef]
- Wu, X.; Fan, X.; Luo, P.; Choudhury, S.D.; Tjahjadi, T.; Hu, C. From laboratory to field: Unsupervised domain adaptation for plant disease recognition in the wild. Plant Phenomics 2023, 5, 38. [Google Scholar] [CrossRef]
- Agarwal, S.; Krueger, G.; Clark, J.; Radford, A.; Kim, J.W.; Brundage, M. Evaluating clip: Towards characterization of broader capabilities and downstream implications. arXiv 2021, arXiv:2108.02818. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Xiong, C.; Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 12888–12900. [Google Scholar]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Llava: Large language and vision assistant. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Noyan, M.A. Uncovering bias in the PlantVillage dataset. arXiv 2022, arXiv:2206.04374. [Google Scholar] [CrossRef]
- Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 215232. [Google Scholar] [CrossRef] [PubMed]
- Li, D.; Chen, S. Fine-grained Image Classification Based on MogaNet Network and Multi-level Gating Mechanism. Front. Neurorobotics 2025, 19, 1630281. [Google Scholar] [CrossRef] [PubMed]







| Aspect | Data | Compute | Deploy | Notes |
|---|---|---|---|---|
| Small sample size | 5 | 2 | 3 | Rare diseases and expensive expert labeling. |
| Class imbalance | 4 | 2 | 3 | Minority recall is critical in practice. |
| Field noise/domain shift | 5 | 3 | 5 | Largest gap between lab benchmarks and farms. |
| Multimodal alignment | 4 | 4 | 4 | Sparse paired data and missing modalities. |
| Interpretability | 3 | 2 | 5 | Needed for farmer/agronomist trust. |
| Disease | Pathogen (Type) | Typical Leaf Symptoms |
|---|---|---|
| Bacterial spot | Xanthomonas (bacteria) | Small, water-soaked spots turning dark; may coalesce under humid conditions. |
| Early blight | Alternaria solani (fungus) | Concentric rings (“target spots”), often starting on older leaves. |
| Late blight | Phytophthora infestans (oomycete) | Irregular lesions with pale edges; fast spread under cool/wet conditions. |
| TYLCV | Begomovirus (virus) | Yellowing, upward curling, stunting; reduced fruit set. |
| Leaf mold | Passalora fulva (fungus) | Yellow spots upper leaf; olive-green mold underside in high humidity. |
| Septoria leaf spot | Septoria lycopersici (fungus) | Numerous small grayish spots with dark margins; defoliation in severe cases. |
| Symbol | Meaning |
|---|---|
| Image/text/molecular input spaces | |
| Disease label set (including healthy) | |
| Multimodal classifier with parameters outputting class probabilities | |
| Probability simplex over K classes | |
| Loss function (cross-entropy: ) | |
| Empirical risk vs. expected (true) risk | |
| Class prior ; imbalance when |
| Aspect | Summary |
|---|---|
| Databases and period | Web of Science, Scopus, Google Scholar; years 2015–2025 (focus on 2023–2025). |
| Search strategy | Keyword search on tomato leaf disease detection, limited data, few-shot/self-supervised learning, domain generalization, multimodal fusion. |
| Inclusion/exclusion | Include: tomato leaf diseases + AI/ML/DL methods + quantitative results; exclude: non-tomato crops/tasks or studies without clear methodology/metrics. |
| Study coding | Two-stage coding into limited-data, multimodal, deployment categories; record datasets, backbones, and evaluation protocols. |
| Bias and mitigation | Potential language, indexing, and publication bias mitigated via multi-database search, citation chasing, and manual screening to reduce topic drift. |
| Challenge | Effect on Models | Typical Strategies |
|---|---|---|
| Small sample size | Overfitting to few labeled images; poor generalization to new fields or cultivars. | Transfer learning from large datasets; self-supervised pretraining on unlabeled field images; few-shot/meta-learning; active learning for expert labeling. |
| Class imbalance | Bias toward frequent diseases; low recall/F1 for rare but important classes. | Class-weighted or focal loss; over-/under-sampling; synthetic minority generation (GANs, diffusion); ensemble and self-distillation focusing on rare classes. |
| Low-quality/ noisy images | Lesion cues obscured by clutter, blur, or lighting; large gap between lab and field domains. | Task-aware augmentation; domain adaptation/generalization; robust architectures and regularization; training on mixed lab–field datasets. |
| Strategy | Principle | Representative Methods | Advantages | Limitations |
|---|---|---|---|---|
| Transfer learning | Initialize from large-scale pretraining and fine-tune on tomato data | ImageNet-pretrained CNN/ViT; DeiT-style fine-tuning; domain-specific pretraining | Strong baseline; fast convergence | Residual domain shift; sensitive to fine-tuning recipe |
| Self-/ensemble distillation | Use soft targets from EMA/ensemble to regularize learning | EMA teacher; multi-branch self-distillation; snapshot ensemble distillation | Improves generalization under small data | Training/inference overhead; teacher quality matters |
| Data augmentation and regularization | Expand effective sample diversity and reduce overfitting | RandAugment/AutoAugment; MixUp/CutMix; label smoothing; stochastic depth | Cheap; plug-and-play; boosts robustness | May distort symptoms; tuning cost; gains saturate |
| Self-supervised pretraining | Learn transferable representations without labels, then fine-tune | SimCLR/MoCo/DINO; MAE-style masked pretraining on plant images | Better feature reuse; label-efficient | Extra pretraining compute; mismatch if pretrain domain differs |
| Semi-supervised learning | Leverage unlabeled tomato images via consistency/pseudo-labels | FixMatch/Mean Teacher; pseudo-labeling with confidence threshold | Reduces label demand; strong under scarce labels | Error amplification; sensitive to threshold/imbalance |
| Few-shot/metric learning | Classify by learned embedding distances with few labeled examples | Prototypical Networks; Matching Networks; cosine classifier; episodic training | Good for new diseases/rare classes | Episode design complexity; unstable if intra-class variance is high |
| Synthetic data/generative augmentation | Generate or translate images to enlarge target distribution | GAN-based synthesis; diffusion-based generation; style transfer (lab→field) | Covers rare cases; enriches backgrounds | Quality/label fidelity risk; may introduce artifacts/bias |
| Domain generalization/adaptation | Reduce domain shift between lab and field settings | Domain adversarial training; style normalization; test-time adaptation (TTA) | Improves cross-dataset robustness | May require target data; stability/reproducibility issues |
| Active learning | Query the most informative samples to label first | Uncertainty sampling; diversity sampling; core-set selection | Maximizes annotation efficiency | Needs iterative labeling loop; selection bias risk |
| Modality | What It Provides | Common Fusion Strategies |
|---|---|---|
| Image (RGB/IR) | Lesion color/shape/distribution; visual symptoms | Early fusion, late fusion, cross-attention |
| Text (symptom descriptions) | Semantic symptom attributes; context (stage/management) | CLIP-style alignment, cross-attention, late fusion |
| Molecular (virus/genomics) | Direct evidence of infection/strain; early signal | Hybrid fusion, MoE experts, late fusion when sparse |
| Sensors/environment | Risk factors (humidity/temperature/leaf wetness) | Temporal fusion (RNN/TCN), hybrid fusion, MoE |
| Fusion Type | How It Works | Acc. pot. | Interpre- Tability | Comp. Cost | Missing-Mod. Robustness |
|---|---|---|---|---|---|
| Feature-level (early) | Fuse embeddings before the classifier (concat/gating/attention) | High | Medium | High | Low |
| Decision-level (late) | Separate unimodal models; fuse scores/probabilities | Medium | Medium–High | Low–Medium | High |
| Hybrid (cross-attn/MoE) | Cross-modal interaction + modality-specific heads/experts | High | High | Medium–High | Medium–High |
| Fusion Type | Mechanism | Main Strengths | Main Weaknesses |
|---|---|---|---|
| Feature-level (early) | Concatenate or transform modality features before classification. | Rich joint representation; captures fine-grained cross-modal interactions. | High dimensionality; prone to overfitting under few paired samples; sensitive to missing modalities. |
| Decision-level (late) | Separate classifiers; fuse probabilities or scores. | Simple; robust when some modalities are absent; flexible with heterogeneous data. | Limited cross-modal interaction; fusion weights often hand-tuned; may ignore subtle complementarities. |
| Hybrid fusion | Attention-based feature interaction plus modality-specific heads. | Balances expressiveness and robustness; provides interpretable cross-modal attention. | More parameters and training complexity; unstable when multimodal pairs are very sparse or noisy. |
| Dataset | #Imgs | #Cls | Disease Categories (Examples) | Imaging Conditions | Modalities | Paired? |
|---|---|---|---|---|---|---|
| PlantVillage [90] | ∼18 k | 10–15 | TYLCV, early/late blight, leaf mold, Septoria, healthy (tomato subset) | Lab-like, controlled background/lighting | RGB | No |
| PlantDoc [47] | ∼2.5 k | 8–10 | Field diseases with cluttered backgrounds (tomato subset) | In-the-wild, occlusion, illumination variation | RGB | No |
| Tomato-Village [26] | ∼7 k | 8 | Includes rare classes (e.g., leaf miner, spotted wilt) | Multi-region field captures | RGB | No |
| Dataset of Tomato Leaves [6] | ∼6 k | 6 | Common diseases + healthy | Field/greenhouse, natural background | RGB | No |
| TLDITRD [48] | ∼6 k pairs | 6 | Six tomato disease classes (paired descriptions) | Field settings, paired annotations | RGB + Text | Yes |
| Model | Task | Dataset(s) | Key Reported Results/Notes |
|---|---|---|---|
| EMA-DeiT [6] | Classification | PlantVillage, PlantDoc, Tomato-Village, Tomato Leaves | Accuracy: 99.6% (PV), 97.1% (PD); strong baseline but residual lab-to-field gap. |
| KD-ShuffleNetV2 [37] | Classification | Aggregated multi-source tomato datasets | ∼95% accuracy; ∼1.27 M params; edge-friendly with self-distillation gains. |
| YOLOv11m [68] | Detection | Curated detection dataset | High mAP on curated data; needs cross-dataset eval and modality ablations. |
| LAFANet [48] | Image-text retrieval | TLDITRD | R@1 ≈ 81.7% (I→T), 80.3% (T→I); sensitive to text noise and pair scarcity. |
| Family | Representative Models | Backbone | Params | Modality | Key Strengths/Limitations |
|---|---|---|---|---|---|
| Convolutional | ResNet variants; EfficientNet | ResNet-50/101; EffNet-B0/B3 | 20–40 M | RGB | Mature and stable; strong on PlantVillage; may be heavy on edge; needs adaptation for field images. |
| Lightweight CNN | ShuffleNetV2; MobileNetV2; KD-ShuffleNetV2 [37] | Depthwise/channel-shuffle CNNs | 1–3 M | RGB | Mobile-friendly; benefits from transfer + self-distillation; limited capacity for complex multimodal tasks. |
| Transformer-based | DeiT; EMA-DeiT [6] | ViT/DeiT | 20–30 M | RGB | Strong with pretraining; flexible; needs strong regularization/aug; higher memory footprint. |
| Object detectors | YOLOv5/YOLOv8/ YOLOv11m [45,46,68] | One-stage detectors | 10–30 M | RGB (bbox) | Localize lesions/leaves; sensitive to annotation quality; heavier than classifiers. |
| Multimodal fusion | LAFANet [48]; image–molecular prototypes | ViT + BERT-like | 40 M+ | Image+text (+mol.) | Richer decision support; interpretable fusion; requires paired data and more complex training. |
| Platform | Typical Constraints | Recommended Model Traits |
|---|---|---|
| Smartphone | Limited battery; variable camera quality | ≤5 M params; fast inference; strong augmentation/DG |
| Embedded board | Limited RAM/compute; continuous operation | Lightweight backbone; pruning/quantization; robust to noise |
| Greenhouse gateway | Multi-sensor sync; intermittent missing data | MoE/hybrid fusion; modality dropout robustness |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hu, Y.; Li, H.; Yang, C.; Chen, N.; Pan, Z.; Ke, W. Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review. Mathematics 2026, 14, 422. https://doi.org/10.3390/math14030422
Hu Y, Li H, Yang C, Chen N, Pan Z, Ke W. Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review. Mathematics. 2026; 14(3):422. https://doi.org/10.3390/math14030422
Chicago/Turabian StyleHu, Yingbiao, Huinian Li, Chengcheng Yang, Ningxia Chen, Zhenfu Pan, and Wei Ke. 2026. "Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review" Mathematics 14, no. 3: 422. https://doi.org/10.3390/math14030422
APA StyleHu, Y., Li, H., Yang, C., Chen, N., Pan, Z., & Ke, W. (2026). Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review. Mathematics, 14(3), 422. https://doi.org/10.3390/math14030422

