Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition

Radočaj, Petra; Jurišić, Mladen; Radočaj, Dorijan

doi:10.3390/agriculture16070716

Open AccessArticle

Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition

by

Petra Radočaj

¹,

Mladen Jurišić

²

and

Dorijan Radočaj

^2,*

¹

Layer d.o.o., Vukovarska Cesta 31, 31000 Osijek, Croatia

²

Faculty of Agrobiotechnical Sciences Osijek, Josip Juraj Strossmayer University of Osijek, Vladimira Preloga 1, 31000 Osijek, Croatia

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(7), 716; https://doi.org/10.3390/agriculture16070716

Submission received: 27 February 2026 / Revised: 20 March 2026 / Accepted: 23 March 2026 / Published: 24 March 2026

(This article belongs to the Special Issue Diseases Diagnosis, Prevention and Weeds Control in Crops—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The threat of plant diseases in economically significant crops of the Solanaceae family, especially tomatoes and potatoes, is a significant challenge to global food security, highlighting the necessity of fast and convenient diagnostic methods. This paper introduces an enhanced MobileNetV2 model to perform automated disease classification through the use of a domain-specific self-supervised learning (SSL) pretraining approach. The model was first trained on 54,303 unlabeled plant images to learn basic botanical representations, followed by fine-tuning under six experimental conditions to optimize disease classification performance. Findings show that SSL pretrained weights consistently outperform traditional ImageNet-based transfer learning, achieving 0.9158 overall accuracy and a weighted F1-score of 0.9143 in joint tomato and potato classification. The model demonstrates strong cross-crop generalization, correctly identifying Early Blight and Late Blight with accuracies of 0.9600 and 0.9359, respectively, and effectively separating disease-specific visual symptoms from host morphology. Confusion matrix analysis further indicates a reduction in misclassification of visually similar necrotic lesions, a common challenge in supervised models. Overall, the proposed SSL architecture enhances the performance of lightweight convolutional neural networks (CNNs) to a large extent, providing a strong, computationally efficient solution for field-deployable diagnostics in precision agriculture, particularly for tomato and potato crops.

Keywords:

domain-specific pretraining; cross-crop generalization; phytopathological features; lightweight architecture; leaf disease classification; PlantVillage dataset

1. Introduction

Global agricultural production is currently at a critical crossroads, driven by the imperative of feeding a population projected to reach nearly 10 billion by 2050 [1,2]. In this regard, Solanaceae-family crops, especially potatoes (Solanum tuberosum) and tomatoes (Solanum lycopersicum), are of critical importance to global food security and economic stability [3,4]. With over 170 million tons of annual production, tomatoes are the most produced vegetable globally and account for about 15% of the total vegetable consumption [5]. At the same time, potatoes are also a vital source of carbohydrates for billions of people, and their systematic protection greatly contributes to the alleviation of food instability [6]. Based on FAOSTAT data [7], the average yield for potatoes across the globe was 22.9 t/ha in 2024, a 41% increase from 16.2 t/ha in 2000. Tomato production yielded 36.8 t/ha in 2024, almost a 30% increase from the beginning of the millennium. The global yield trends for potatoes and tomatoes are presented in Figure 1. Although these statistics indicate advancements in agricultural techniques and selective propagation, increased production means that these crops become more susceptible to massive epidemics. Early Blight (Alternaria solani) and Late Blight (Phytophthora infestans) are important pathogens that can quickly halt or diminish these productivity advancements [8,9,10]. Therefore, the development of efficient, low-resource diagnostic tools is imperative to sustain this growth and ensure long-term global food security [11].

Plant diseases pose significant obstacles to the development of sustainable agriculture, leading to an estimated annual economic loss of up to 220 billion USD [12]. Annual losses are the result of crop-destroying pathogens like fungi, bacteria, and viruses, which can decimate crops within a single growing season. For example, particular crops like sunflowers experience yield losses caused by pathogens of between 30% and 90% [13]. Additionally, a single potato disease, Late Blight, is responsible for annual global economic losses of almost 6.7 billion USD [14,15]. In the tomato-farming sector, annual yield losses attributed to foliar diseases are between 8% and 10% [16]. The outbreaks of plant diseases are an immediate danger to crops and, therefore, to the global food supply, which makes the development of accurate and timely disease detection methods a necessity. Conventional diagnostic techniques, however, suffer from serious drawbacks. Laboratory-based methods like the Polymerase Chain Reaction (PCR) method are very reliable, but they are also slow, costly, and often inaccessible for smallholder farmers living in remote and developing countries [17]. Visual inspection, on the other hand, is very subjective and tends to detect a disease only once the symptoms are obvious, by which time the infection is often too advanced for effective intervention [10,17].

Modern agriculture disciplines utilize digital image processing and artificial intelligence (AI) to help recognize crop symptoms automatically and objectively, allowing quicker and broader disease detection [18,19]. Disease detection spans from traditional ground surveying to drone and satellite high-altitude imaging [20]. Hyperspectral imaging can sense physiological stress from necrotic symptoms before they can be seen, while standard RGB cameras only allow analysis of visible necrotic symptoms [21]. Despite standard RGB cameras being suitable for wider use and cheaper to obtain, hyperspectral systems can cost anywhere from 20,000 USD to 50,000 USD, which makes their use more costly and less practical [22]. Regardless of the used hardware, the computational techniques used are dependent on the analysis of images, and over time, these techniques have developed from using color-tracking algorithms to using advanced deep neural networks that are capable of identifying features across multiple levels [23].

In recent years, deep learning and CNNs have greatly changed the process of identifying plant diseases [24]. CNNs learn to identify and distinguish images of diseases without requiring engineers to pre-process the images. ResNet, VGG, Inception, and EfficientNet are examples of image classification models that have been proven to identify more than 90.00% of a variety of plant leaf diseases [25]. As an example, explainable AI visualization techniques, like saliency maps, have shown that certain specialized models can achieve near-perfect precision in identifying Blights on potato and tomato leaves by focusing on the relevant necrotic areas [26,27]. In addition, hybrid models, including CNNs and LSTMs for temporal analysis, attention for feature localization, and GANs to improve the model’s performance and robustness (i.e., through the addition of synthetic data), have also shown evidence of improving analysis accuracy while keeping the required computational demand for mobile deployable models low [28,29,30].

Automation in diagnostics is certainly more efficient than its alternatives; however, it still has significant limitations. One challenge is referred to as “domain shift,” in which models lose as much as 30% accuracy when moving from lab-based training environments to real-world scenarios due to factors such as lighting, occlusions, and varying distractor backgrounds [22,31]. Supervised deep learning also requires large amounts of labeled data, and manually annotating thousands of images demands expert knowledge, creating a bottleneck for scaling models [26]. The inequalities between classes, that is, the inability to interpret the models and often called the black-box problem, are also one of the main challenges of most current methodologies. Conventional diagnostic algorithms that use sophisticated computational algorithms have long been considered prohibitively costly because of the requirement of high-performance clusters of GPUs. Nonetheless, with recent developments in smartphone camera technologies, it is slowly becoming possible to have more accessible, real-time diagnostic applications in the field conditions without the need to rely on expensive computational infrastructure [32].

To address these limitations, this study focuses on SSL and few-shot learning as strategies to overcome the scarcity of labeled data. SSL allows models to learn meaningful visual representations from unlabeled images by capturing the intrinsic morphological patterns of plant leaves. While SSL has been explored in previous agricultural studies, the novelty of this work lies in its integration of domain-specific SSL pretraining on a large PlantVillage dataset, the evaluation of a lightweight MobileNetV2 backbone optimized for resource-constrained devices, and multi-stage fine-tuning experiments to assess cross-crop generalization between tomatoes and potatoes, which share highly similar visual symptoms for Early Blight and Late Blight. By pretraining on a broad botanical domain, the model captures subtle morphological features that generic ImageNet-based weights may overlook. Pretraining a model on a broad botanical domain can build a strong feature base that generalizes much more efficiently than generic, weight bases obtained with other, non-agricultural datasets [33]. Such functionality is especially important in low-resource settings, where labeled samples are scarce. This research paper specifically applies the MobileNetV2 architecture, which is a small but efficient edge computing-optimized model, to offer a practical solution to real-time identification [34]. This approach enables the model to remain computationally efficient while providing insight into the transferability of features across closely related crops, offering a practical solution for in-field disease diagnosis. The main scientific contribution of this study is the determination of the transferability of features between taxonomically related species of the Solanaceae family. As tomatoes and potatoes are prone to the same pathogens, namely Alternaria solani and Phytophthora infestans, they have almost identical visual symptoms, such as concentric rings and necrotic water-soaked lesions [35]. The most important contributions of this paper include the following:

The systematic evaluation of a lightweight MobileNetV2 backbone is performed in a domain-specific SSL framework, which proves its applicability to data-scarce agricultural settings where annotating large scales of data is not feasible.
Through a thorough comparison of domain-specific SSL pretraining and standard ImageNet transfer learning, this study provides empirical evidence that SSL produces more robust features and enables better cross-species transfer for visually similar tomato and potato pathogens.
The transferability of acquired visual representations is studied among species belonging to the same taxonomically related family, Solanaceae, particularly Early Blight and Late Blight, as they both have very similar morphological symptoms among hosts.
The possibility of implementing lightweight models trained with the use of SSL on resource-constrained hardware is presented, along with their promise of real-time, in-field disease classification and scalable implementation in precision agriculture.

The structure of this paper is as follows: Section 2 discusses some of the studies that are relevant. In Section 3, the models and methodologies used in this research are described with an overview of the six different experimental setups. Section 4 is a summary of the research findings. Section 5 contains concluding remarks and future works.

2. Related Works

Deep learning has transformed automated plant disease diagnosis. However, the shortcomings of supervised learning in data-scarce environments still pose a considerable challenge. The recent studies have focused on self-supervised learning and domain-specific transfer learning to enhance the model’s robustness and generalization in numerous agricultural settings, especially when labeled data are costly or difficult to obtain.

One of the key innovations in this field is the implementation of the usage of SSL as a way of reducing the load of laborious annotations. To bridge the gap between laboratory and in-field imagery, Huan et al. [33] created a coherent SSL structure that combines three complementary tasks: Bootstrap Your Own Latent (BYOL) to global align, Masked Image Modeling (MIM) to learn local structures, and contrastive learning to learn instance-level distinction. Their hybrid loss function based on a ResNet101 backbone can learn transferable representations using unlabeled data. Their structure obtained 77.82% accuracy and a 77.48% macro-averaged F1-score on the difficult PlantDoc dataset, which has complicated in-field backgrounds. Its performance was, however, the best during the fine-tuning operation on the PlantVillage dataset, where it attained 99.85% accuracy. This highlights the possibility of the use of SSL to capture domain-specific characteristics that supervised models, usually trained on generic datasets such as ImageNet, do not detect.

Safonova et al. also discussed the efficiency of the SSL [36] method and gave a comparative analysis of supervised and self-supervised paradigms with an agricultural sub-set of the LUCAS dataset. Their study addressed the problem of limited samples, and they tested many different types of architectures, including VGG16, ResNet, and MobileNetV2. In the case of the SSL component, they used the VICReg algorithm. The results they achieve are especially relevant to low-resource instances, which shows that fine-tuned models of the SSL method can perform better than conventional supervised models with only 5% of the labeled training types. It is interesting to note that their model of the SSL of crops demonstrated a unique ability to form semantically relevant visual clusters of crops, such as cereal-like and broadleaf crops, using only the morphological information in their unlabeled pretraining step.

This difference between laboratory and field performance is further highlighted by research on particular crops including potatoes and tomatoes. To overcome the issue of limited coverage of the current systems in remote areas, Mulugeta et al. [37] created a specialized CNN to identify potato leaf diseases. Their architecture, which had been trained on 16,000 images, had 85.00% accuracy and an 86.00% F1-score on field data, which is far higher than in pretrained models such as VGG19 and ResNet50. Likewise, Sinshaw et al. [38] examined the use of InceptionV3 and transfer learning to detect potato Late Blight. To address the limitations of insufficient data, they used 596 benchmark images with 430 images collected by the authors themselves. They reported an 87.00% score on unseen data with their model based on data augmentation and 5-fold cross-validation, which highlights the importance of a robust augmentation in small-data regimes. Considering the example of tomato pathology, Ahmad et al. [39] optimized several different architectures and noted a steady 10% to 15% drop in performance under laboratory and field conditions. To be more specific, the precision of InceptionV3 dropped from 93.40% in the laboratory to 85.00% on field data, highlighting the impact of domain shift. In an effort to address the issues of environmental challenges and privacy, Cristea and Dobre [40] suggested a federated transfer learning scheme using a hybrid Graph-SNN model. Although it achieved 94.45% accuracy on laboratory data, its performance fell to 62.02% under field conditions, highlighting the ongoing difficulty of capturing real-world variability, such as dense vegetation and evolving symptom expression, within existing AI architectures.

Wu and Liu [41] introduced a reconstruction–generation network (RGN) with attention mechanisms to deal with small and noisy annotations of greenhouse crop images. Using unsupervised VAE-based pretraining with subsequent supervised fine-tuning, their model reported the highest classification accuracy of 98.03%, proving that self-supervised reconstruction may be effective at identifying discriminative regions and enhancing recognition when labels are limited. Similarly, Kabya et al. [42] used a self-supervised pretraining model implemented on the curated Malabar spinach leaf dataset using SimSaim to address the shortage of annotations. Their SpinachCNN and SimSiam-CBAM models, with 97–98%accuracy and employing their interpretability through Grad-CAM methods, demonstrated that SSL can be successfully used with both attention-based and transformer-based architectures to improve their real-word accuracy and feasibility on agricultural data.

Despite these advancements, several research gaps remain. The vast majority of SSL studies rely on high-capacity backbones like ResNet101 or ResNeXt-50, which are computationally intensive and less suitable for real-time deployment on the resource-constrained mobile devices used in the field. In addition, limited attention has been given to lightweight architectures such as MobileNetV2, which are more suitable for deployment on resource-constrained mobile devices that are commonly used in agricultural settings. Furthermore, while multi-crop analysis is frequently mentioned, there is scant data on the effects of domain-specific pretraining on the fine-tuning performance between taxonomically related species under limited labeled data regimes. To address these gaps, we evaluate a domain-specific SSL pretrained MobileNetV2 architecture focused on the Solanaceae family, demonstrating that a lightweight model can achieve robust cross-crop generalization between tomatoes and potatoes while remaining computationally efficient for field deployment.

Tomatoes (Solanum lycopersicum) and potatoes (Solanum tuberosum) share high morphological similarities and are susceptible to the same fungal and oomycete pathogens [43,44]. In particular, Early Blight (Alternaria solani) and Late Blight (Phytophthora infestans) exhibit almost identical visual symptoms, such as concentric rings and necrotic water-soaked lesions, across both hosts [45,46]. Our study examines the transferability of visual representations in related crops by concentrating on these common pathological characteristics. We aim to demonstrate that a model pretrained on a broad botanical scale can effectively exploit the biological similarities within the Solanaceae family to achieve high diagnostic accuracy with a minimal number of labeled examples. By leveraging this approach, plant diseases can be classified in real time with minimal computational resources, making it suitable for agricultural applications.

3. Materials and Methods

The research approach of this study evaluates the suitability of SSL for learning robust botanical features that enable automatic disease classification. With the use of the MobileNetV2 architecture, the present work examines how the specifics of domain pretraining of unlabeled plant visuals can be applied to address the fundamental bottlenecks of conventional supervised transfer learning in the agricultural domain. The presented strategy is implemented by a multi-stage experimental pipeline (Figure 2) that includes: (1) an initial self-supervised pretraining step that is intended to create generalized phytopathological representations; (2) a comparative benchmarking step when the proposed model is compared with conventional ImageNet-based weights; and (3) a sequence of specialized fine-tuning experiments that will help us investigate cross-crop generalization; and (4) an overall multi-class evaluation step to identify how robust the suggested model is in distinguishing between various pathogens in different hosts at the same time. It is this systematic setup that enables the objective assessment of the model to discover overlapping morphological symptomatic tissues, as is being demonstrated by Early Blight and Late Blight in various hosts in the Solanaceae family.

3.1. Dataset Characteristics and Preparation

All experiments utilize the PlantVillage dataset from the year 2015 [47], which is a highly regarded and extensive dataset containing images of leaves. The PlantVillage dataset consists of 54,303 RGB leaf images organized into 38 distinct classes that represent combinations of crop species and health/disease conditions. The images were captured under controlled conditions with consistent lighting and neutral backgrounds [47]. For the SSL step (ID 0), we began with 54,303 images to create a strong baseline on feature extraction. These images cover a diverse set of plant species and disease states beyond Solanaceae and were included without labels for self-supervised pretraining to learn general botanical features. Afterward, we focused on the Solanaceae family, particularly the potato (Solanum tuberosum) and tomato (Solanum lycopersicum) crops. These crops are of high importance as they display almost identical visual symptoms when infected by the same pathogens. This is illustrated in Figure 3. The dataset categories for the supervised tasks were:

Healthy (H): Leaves with no visible disease symptoms.
Early Blight (EB): Small dark spots that form concentric “target” rings.
Late Blight (LB): Large, dark, water-soaked lesions that spread quickly across the leaf.

The images from experiments (ID 1–4) were split into three subsets (70% for training, 15% for validation, and 15% for testing) for a more accurate assessment. Moreover, for Experiment 0, no splitting was done because self-supervised pretraining did not necessitate label-based stratification; hence, the whole dataset of 54,303 images was used. The complete breakdown of the images used in all stages of the experiments can be found in Table 1. While the PlantVillage dataset mainly consists of images captured under controlled laboratory conditions, we acknowledge that field conditions present additional challenges such as variable lighting, complex backgrounds, and occlusions. To specifically address this, extensive data augmentation was applied during training to improve the model’s robustness to such variations.

For consistency with the MobileNetV2 architecture, each image was processed using the same pipeline and then resized to 224 × 224. In order to improve the model’s generalization ability to different environments, we implemented an extensive data augmentation technique including random horizontal and vertical flips, as well as brightness, contrast, and saturation adjustments. These techniques will improve robustness to variations in lighting and leaf orientation that are commonly encountered in field conditions.

We performed model development and training using the Keras [48] and TensorFlow [49] libraries in Python 3.11 on the Kaggle platform with NVIDIA GPU support. Each model was trained for 30 epochs with a batch size of 64 while using the Adam optimizer with a learning rate of 1 × 10⁻⁴. The Adam optimization algorithm was selected to update the model’s parameters for a stable and efficient training process.

3.2. MobileNetV2 Architecture and Self-Supervised Learning Approach

This study is technically based on the incorporation of the MobileNetV2 architecture into an SSL framework. The choice of the neural network is very important in agricultural diagnostics, particularly when the models are intended for use on mobile devices or edge computing equipment [50]. The reason why MobileNetV2 was selected is because it is a compromise between computational efficiency and recognition accuracy. It has an architecture that relies on inverted residual structures and bottleneck layers with depthwise separable convolutions [51,52]. In contrast to typical convolutions, when all channels of the input are processed at the same time, MobileNetV2 does this in two steps: depthwise convolution as a spatial filter and pointwise convolution as a feature combiner. This method minimizes the parameters and computation necessary (FLOPs) and retains the capacity to obtain complex botanical textures and patterns [53].

Training was done in two stages to utilize feature robustness to the maximum. The whole dataset of 54,303 images of PlantVillage was subjected to an SSL approach (ID 0) in the first stage. Unlike common transfer learning approaches that rely on models pretrained on ImageNet, which contains general objects such as vehicles and animals [23], our domain-specific pretraining allows the network to capture plant-specific morphological features. The network solved a pretext task on raw plant images without manual labels and learning characteristics such as leaf margin symmetry, venation patterns, and textural characteristics of necrotic tissue [54]. By assigning a domain-specific weight, the model was primed to detect leaf structures before moving on to disease-specific classification.

In the second stage, supervised fine-tuning was conducted in six different experimental scenarios (IDs 1–4), transferring the pretrained weights to classify plants as Healthy, Early Blight, and Late Blight. The fine-tuning head consisted of a 256-unit dense layer with a Mish activation function followed by batch normalization before being sent to the final output layer. Such transfers are useful for the Solanaceae family as the model has already acquired the general structure of healthy potato and tomato leaves. Supervised fine-tuning aim to identify the fine-grained differences in lesion morphology, for example the concentric “target-like” rings of Early Blight (Alternaria solani) and the broader, water-soaked lesions of Late Blight (Phytophthora infestans) [35,38].

A diagnostic framework that was built on the lightweight design of MobileNetV2, along with a dual-stage training strategy, was constructed to enable computational efficiency with the ability to generalize across related crop species.

3.3. Validation Strategy and Performance Metrics

We implemented an elaborate validation process at every stage of our experiments to evaluate the effectiveness of the SSL pretrained MobileNetV2 model. We aimed to assess the model’s ability to generalize its understanding of plant pathologies other than those pertaining to one crop type. The model learns to pick up important biological details, rather than memorizing specific details about the test set, as outlined in Section 3.1, and to further confirm that, each experiment subset was evaluated on a separate test set that was kept hidden from the model during the training or validation phases.

Performance quantification of the models employed standard classification metrics such as precision, recall, F1-score, and accuracy. These metrics were determined as outlined in Equations (1)–(4):

Precision = \frac{TP}{TP + FP},

(1)

Recall = \frac{TP}{TP + FN},

(2)

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall},

(3)

Accuracy = \frac{TP + TN}{TP + FP + TN + FN},

(4)

The variables in the equations are as follows:

TP

,

TN

,

FP

, and

FN

correspond to the number of true positives, true negatives, false positives, and false negatives, respectively. Considering the small discrepancies observed in the number of images in some experiments between the tomato and potato samples, we used both macro and weighted averages in our final assessment. The macro average considered the performance of each disease class equally, irrespective of the class’s size, while the weighted average offered a perspective of the model’s performance in relation to the entire image count. Furthermore, confusion matrices were produced for each experiment to study the specific patterns of misclassification between morphologically indistinguishable diseases, such as Early Blight and Late Blight.

4. Results and Discussion

The findings of this paper show that domain pretraining is highly beneficial for agricultural disease classification. Experiment ID 0 was the first step in this process, during which the MobileNetV2 backbone was self-supervised pretrained on the PlantVillage dataset. An example of the effectiveness of this step can be seen in the training loss curve in Figure 4. As indicated, the mean training loss in ID 0 was rapidly decreasing during the initial five epochs, reducing from about 3.14 to 2.91. Such a fast initial convergence indicates that the model soon discovered high-level visual regularities inherent in the unlabeled images of plants [33]. Following this sharp decline, the loss kept decreasing gradually and, by the 30th epoch, had reached a constant of approximately 2.87, which shows regular convergence.

The steady convergence in ID 0 shows that the self-supervised pretext task stimulated the model to grasp meaningful latent representations for plant morphology. Unlike standard supervised learning, where optimization is guided directly by category labels, the SSL objective encouraged the MobileNetV2 architecture to learn the intrinsic structural variability present across the 54,303 images in an unsupervised manner. This is particularly advantageous for agricultural image analysis, where disease-related visual cues are often embedded within complex backgrounds characterized by leaf venation patterns, varying illumination levels, and natural environmental variability [55,56].

The fact that the loss plateau was approached by a steady level is also indicative that the model gradually encoded higher-order structural patterns of importance to plant health and pathology [57]. As a result, the latter series of supervised experiments (IDs 1–4) had a well-tuned feature space, which can be regarded as a warm start to downstream classification. This setup enabled this model to generalize across potatoes and tomatoes since, by this time, it had acquired the generalized patterns of leaves and lesions that are universal between the two host plants. In particular, the convolutional filters were already tailored to perceive the appropriate low- and mid-level features like the chlorosis areas, the necrotic spots, and textural irregularities that are typical of Solanaceae leaves. This highly specific initialization serves as an implicit regularizer, which minimizes overfitting during fine-tuning on smaller labeled subsets of potato and tomato images. Rather than learning from randomly initialized weights, the model began supervised training from a representation space already aligned with domain-specific visual structures, thereby improving generalization and data efficiency [33,36].

The supervised experiments offered additional confirmation of the model’s diagnostic capabilities across different scenarios. The various performance metrics across all experimental configurations are aggregated in Table 2.

One of the most important results is noticed in ID 1, where the SSL pretrained model had reached an accuracy of 0.9158, which is much higher than that of the standard ImageNet-based version, which had reached 0.8577. In addition to accuracy, the macro F1-score of 0.9013 of the SSL method versus 0.8392 of ImageNet indicates better capability of sustaining the balance between accuracy and recall up to all classes. This implies that weights that have been trained on common objects have some form of feature blindness to the micro-textures of plant pathology [36]. In comparison, domain-specific pretraining enabled the convolutional filters to focus on cellular-level anomalies, yielding a weighted precision of 0.9261 in ID 1 and hence highlighting the confidence of the model in reducing false-positive identifications in a mixed-crop setting.

The difference between the macro and weighted measures of ID 1 and ID 2b adds more insight regarding the behavior of the model. The weighted F1-score of 0.9143 in the joint classification task (ID 1) is marginally better than the macro average, which means that the model is highly effective in visual patterns that appear more in the dataset. In ID 2b, however, this gap is larger, and macro recall is 0.6759 with an accuracy of 0.7245. This inconsistency indicates that although the model can be used to identify healthy leaves very consistently, it experiences more difficulties with the intra-class variation in potato lesions. Botanically, the necrotic tissues of the leaves of potatoes tend to have more jagged edges and color gradients because of various levels of chlorosis, which can add noise to the feature space of models such as MobileNetV2 [58,59,60].

The strongest evidence of the biological abstraction of the model lies, however, in the experiments of pathogen specificity, namely ID 3a and ID 3b. The fact that Early Blight and Late Blight have accuracy scores of 0.9600 and 0.9359, respectively, shows that the model is able to decouple the symptoms of the disease with the background of the host. The nearly identical macro and weighted F1-scores in ID 3a (0.9599) indicate that the model’s diagnostic performance remains highly consistent regardless of the host plant, regardless of whether the leaf belongs to a tomato or potato. This generalization establishes that the phase of domain-specific pretraining (ID 0) guided the network to focus on the universal pathological signifiers, including concentric and target-like rings of Early Blight rather than the generic structure of the leaf itself.

Lastly, ID 4 was used to test the model in distinguishing between these two visually similar pathogens on mixed hosts at the same time. Based on the measures in Table 2, the model achieved a weighted precision of 0.8824 and an accuracy of 0.8616. Although it is a notorious task to differentiate between Early and Late Blights since they are characterized by the same set of necrotic traits, the high weighted F1-score of 0.8567 proves that the model will still be reliable in complex situations. The close alignment between macro and weighted metrics suggests that the SSL-based initialization acts as an implicit regularizer, preventing the model from becoming biased toward any specific host–pathogen combination [61].

To further understand the model’s diagnostic behavior, a detailed breakdown of performance across individual crops and leaf statuses is provided in Table 3. This granular perspective allows for an evaluation of how SSL-derived features handle specific pathological symptoms compared to traditional transfer learning. A primary observation from Table 3 is the exceptional performance of the ID 1 (SSL) model on healthy tomato leaves, achieving a F1-score of 1.0000. In comparison, the ID 1 (ImageNet) model, while still accurate, shows lower precision (0.9130) for the same class. This discrepancy suggests that the SSL pretraining phase (ID 0) provided the model with a superior “understanding” of healthy botanical textures, allowing it to define a cleaner baseline for what constitutes a non-diseased leaf.

The main advantage of the SSL approach is most apparent in the classification of Early Blight (EB) and Late Blight (LB). In ID 1 (SSL), the model achieves a very high recall of 0.9899 for tomato Late Blight, while the ImageNet-based model struggles with Early Blight identification on both crops, reaching recalls of only 0.6447 for tomatoes and 0.5063 for potatoes. This large gap suggests that ImageNet features often miss the subtle early symptoms of Early Blight, resulting in more false negatives [62]. In contrast, the SSL model maintains a recall above 0.71, even for the challenging potato EB class, indicating better sensitivity to the characteristic target-like lesion patterns that generic visual models frequently overlook.

The pathogen-specific experiments, namely ID 3a and ID 3b, further confirm the model’s ability to generalize across species. As shown in Table 3, in ID 3a, the model identifies Early Blight with an F1-score of 0.9577 on tomatoes and an even higher F1-score of 0.9620 on potatoes. This near parity in performance is a crucial finding; it suggests that the model has successfully extracted the “core” visual signature of the Early Blight pathogen that transcends the host’s specific leaf geometry. Similarly, in ID 3b, the high recall for Late Blight on both tomatoes (0.9062) and potatoes (0.9933) reinforces the conclusion that domain-specific pretraining creates features that are robust to the slight morphological variations between host plants.

However, Table 3 also highlights the inherent difficulty of distinguishing between pathogens in mixed scenarios. In ID 4, although the model achieves an excellent F1-score of 0.9857 for potato Early Blight, the recall for potato Late Blight drops to 0.6038. This decrease suggests that some Late Blight lesions are misclassified as Early Blight when their visual symptoms overlap. This is a well-known challenge in phytopathology, since advanced lesions of both diseases can appear as similar necrotic tissue [38,60,63]. Nevertheless, the consistently high precision and recall for Early Blight across both hosts in ID 4 indicate that the SSL pretrained weights provide a stable representation of the most distinctive morphological features.

In order to comprehend how the models make the decision, Figure 5 displays the confusion matrices of all the scenarios of the experiment. These matrices highlight where misclassifications occur, especially between morphologically similar symptoms such as Early Blight and Late Blight. The comparison in Experiment 1 between SSL and ImageNet clearly demonstrates the advantage of domain-specific pretraining. In the ImageNet matrix, a large number of potato Early Blight samples (60) are misclassified as Late Blight, and 12 samples are even predicted as healthy tomato leaves. In contrast, the SSL matrix for ID 1 displays a much stronger diagonal pattern. Although some confusion between EB and LB remains due to their biologically similar necrotic lesions, the SSL model correctly identified 113 potato EB samples, compared to only 80 with ImageNet initialization. This suggests that SSL-derived features are more sensitive to the subtle textural differences that distinguish these two pathogens [36].

In Experiment 2a, the model demonstrates high confidence, with only a small overlap between EB and LB, where 35 Late Blight samples are classified as Early Blight. In contrast, the results for Experiment 2b are more challenging; the model struggled more with Late Blight, misclassifying 76 samples as Early Blight. This pattern suggests that on potato leaves, the visual appearance of Late Blight lesions more closely resembles the necrotic patterns caused by Early Blight than those on tomato leaves, posing an additional challenge for lightweight models.

The strengths of the proposed approach become most evident in the pathogen-specific matrices of Experiment 3. In ID 3a (EB), the model achieves near-perfect separation, with only 12 tomato samples classified as potatoes and no potato samples misclassified as tomatoes. This indicates that the model successfully prioritizes the characteristic visual signature of Early Blight over host-specific background features. A similar trend is observed in ID 3b (LB), where the model maintains a strong diagonal pattern, despite a small number of confusions (27 samples) between the two host species.

Finally, the matrix for Experiment 4 illustrates the robustness of the model in a high-complexity environment. Even though the model misclassifies 55 instances where potato Late Blight is classified as tomato Late Blight, it demonstrates more consistent accuracy in differentiating Early Blight across both types. This strengthens the claim that the SSL pretrained MobileNetV2 model is proficient in differentiating these important pathogens, even in cases where they appear simultaneously on taxonomically similar plants.

When exploring the decision-making process of the model in more detail and confirming its interest in phytopathological markers, we utilized Gradient-weighted Class Activation Mapping (Grad-CAM) and heatmap visualizations, as shown in Figure 6. The fact that the domain-specific weights of the SSL model enable the architecture to be effective in isolating the necrotic regions in relation to the rest of the leaf morphology is confirmed by the visual evidence. In Early Blight, the Grad-CAM maps in both cases (potato and tomato hosts) have a high intensity of activation at distinct characteristic concentric lesions. In the case of Late Blight, the model was able to localize the larger, water-soaked necrotic regions, thus proving that it has learned the typical patterns of expansion of Phytophthora infestans. Such interpretability is essential to field-deployable diagnostics because it allows the predictions of the model to be based on the biological symptoms of relevance and not random background noise or a host-specific leaf-venation pattern. In turn, the given visualizations offer a qualitative validation of the fact that the model can decouple the pathogen-specific aspects of the model with the underlying taxonomic framework of the Solanaceae family.

The SSL-pretrained MobileNetV2 model was evaluated alongside recent state-of-the-art approaches to assess its effectiveness. While high-performance methods, such as attention-based knowledge distillation or Inception-ResNet-V2-based Faster-RCNN, demonstrate strong diagnostic accuracy, they incur substantial computational costs, limiting their applicability on resource-constrained devices [64,65]. In contrast, the proposed model achieves competitive performance on both tomato and potato hosts with a lightweight architecture of just 3.02 million parameters, enabling near-real-time inference and offering a practical, scalable solution for field-level plant disease diagnostics in precision agriculture. The notable level of diagnostic accuracy in a variety of experimental conditions indicates that the application of the MobileNetV2 model with the SSL pretrained version to real-time agricultural monitoring systems has significant potential. The model has a very lightweight architecture and is therefore best suited to run on edge devices, e.g., smartphones or low-cost drones, so that smallholder farmers can run small diagnostics on site without requiring a large cluster of GPUs or live internet connection [30,51,52]. This framework may be used in a precision agriculture scenario to make automated spraying systems that will distinguish Early and Late Blights and can use minimal amounts of fungicides to target the infected areas, thus reducing the environmental impact and cost of operation [66]. To further confirm the appropriateness of the proposed solution to be deployed on resource-constrained devices, its computational performance was considered based on the most elaborate experimental environment (ID 1) that incorporates all the classes of tomatoes and potatoes (Healthy, Early Blight, and Late Blight). The resulting model using MobileNetV2 has 3,017,158 trainable parameters and a total size of 36.65 MB, confirming its lightweight nature and feasibility for mobile applications. The performance of inference was still inferred on a GPU, with an average processing time of 6.79 s per image when doing single-image inference with a batch size of 1 and an average processing time of 0.107 s per image when doing batch processing with a batch size of 64. These findings suggest that, although inference with single images is still computationally expensive, the model can run in near real time in batched settings, a setting frequently used in deployed pipelines. All in all, these results indicate that the presented framework has a positive balance between accuracy and efficiency, which is why it can be applied to the agricultural practices in the real world.

Nevertheless, some limitations should be noted to guide future studies. Although the model shows strong cross-crop generalization with the Solanaceae family, its results on potato leaves in experiment ID 2b and the patterns of confusion in experiment ID 4 suggest that environmental noise (e.g., difference in light conditions, soil backgrounds, and complex leaf overlaps in the field) might remain an issue. Also, this research concentrated on two main pathogens; in a natural ecosystem, there are many possible nutrient deficiencies or secondary infections that might mimic or obscure the effects of Blight. Future work should explore the extension of the proposed SSL and MobileNetV2 frameworks to multi-disease recognition tasks. By leveraging domain-specific pretraining, the models can be fine-tuned to additional diseases with limited labeled data, enhancing its generalization and practical utility in diverse agricultural scenarios. Future efforts should then be directed towards increasing the pretraining of the SSL model to encompass a wider range of real-world images, including field-captured datasets and other physiological stressors, to make the model more reliable under real agricultural conditions.

5. Conclusions

This study demonstrates that a domain-specific SSL framework is effective in agricultural disease diagnostics. Using the MobileNetV2 architecture, we demonstrated that training on a larger botanical dataset in experiment ID 0 yields a high-quality feature space, which is far more effective than standard ImageNet transfer learning. The findings from experiment IDs 1–4 prove that this method not only achieves a high diagnostic accuracy of up to 0.9600 against specific pathogens but also allows the model to identify universal morphological symptoms in a group of taxonomically related species. The fact that the model can dissociate pathological markers, including the pathological markers of Early Blight independently of host morphology (potato or tomato), is a key milestone towards scalable and generalizable AI in agriculture. The proposed framework is lightweight, enabling real-time deployment on resource-constrained devices and serving as a practical tool for precision farming while helping reduce pesticide use. Nevertheless, there are still issues concerning the variability of some crops within the classes themselves, as observed in the potato experiments. Future research will aim to extend SSL pretraining to real-world datasets, taking into account environmental variability and physiological stress factors. Finally, this research has offered a roadmap towards the creation of effective, low-cost diagnostic tools capable of maintaining the productivity of crops globally as well as improving food security.

Author Contributions

Conceptualization, P.R.; methodology, P.R.; software, P.R.; validation, D.R. and M.J.; formal analysis, P.R.; investigation, P.R.; resources, P.R.; data curation, P.R.; writing—original draft preparation, P.R.; writing—review and editing, P.R., D.R. and M.J.; visualization, D.R. and P.R.; supervision, M.J.; project administration, D.R.; funding acquisition, D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in the PlantVillage repository at https://doi.org/10.48550/arXiv.1511.08060 (accessed on 12 February 2026).

Conflicts of Interest

Author Petra Radočaj was employed by the company Layer d.o.o. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Gerten, D.; Heck, V.; Jägermeyr, J.; Bodirsky, B.L.; Fetzer, I.; Jalava, M.; Kummu, M.; Lucht, W.; Rockström, J.; Schaphoff, S.; et al. Feeding Ten Billion People Is Possible within Four Terrestrial Planetary Boundaries. Nat. Sustain. 2020, 3, 200–208. [Google Scholar] [CrossRef]
Wang, X. Managing Land Carrying Capacity: Key to Achieving Sustainable Production Systems for Food Security. Land 2022, 11, 484. [Google Scholar] [CrossRef]
Sentil, S.; Choudhary, M.; Tirsaiwala, M.; Rvs, S.; Suresh, V.M.; Jacob, C.; Paret, M. TOMMicroNet: Convolutional Neural Networks for Smartphone-Based Microscopic Detection of Tomato Biotic and Abiotic Plant Health Issues. Phytopathology 2024, 114, 2431–2441. [Google Scholar] [CrossRef]
Devaux, A.; Goffart, J.-P.; Kromann, P.; Andrade-Piedra, J.; Polar, V.; Hareau, G. The Potato of the Future: Opportunities and Challenges in Sustainable Agri-Food Systems. Potato Res. 2021, 64, 681–720. [Google Scholar] [CrossRef] [PubMed]
Panno, S.; Davino, S.; Caruso, A.G.; Bertacca, S.; Crnogorac, A.; Mandić, A.; Noris, E.; Matić, S. A Review of the Most Common and Economically Important Diseases That Undermine the Cultivation of Tomato Crop in the Mediterranean Basin. Agronomy 2021, 11, 2188. [Google Scholar] [CrossRef]
Li, L.; Zhu, T.; Wen, L.; Zhang, T.; Ren, M. Biofortification of Potato Nutrition. J. Adv. Res. 2025, 75, 23–34. [Google Scholar] [CrossRef]
FAOSTAT. Available online: https://www.fao.org/faostat/en/#data/QCL (accessed on 23 February 2026).
Gold, K.M.; Townsend, P.A.; Chlus, A.; Herrmann, I.; Couture, J.J.; Larson, E.R.; Gevens, A.J. Hyperspectral Measurements Enable Pre-Symptomatic Detection and Differentiation of Contrasting Physiological Effects of Late Blight and Early Blight in Potato. Remote Sens. 2020, 12, 286. [Google Scholar] [CrossRef]
Solomiichuk, M.; Pikovskyi, M. Biological Control of Alternaria and Late Blight of Potatoes. Plant Soil Sci. 2025, 16, 52–60. [Google Scholar] [CrossRef]
Jindo, K.; Evenhuis, A.; Kempenaar, C.; Pombo Sudré, C.; Zhan, X.; Goitom Teklu, M.; Kessel, G. Review: Holistic Pest Management against Early Blight Disease towards Sustainable Agriculture. Pest Manag. Sci. 2021, 77, 3871–3880. [Google Scholar] [CrossRef]
Leite, D.; Brito, A.; Faccioli, G. Advancements and Outlooks in Utilizing Convolutional Neural Networks for Plant Disease Severity Assessment: A Comprehensive Review. Smart Agric. Technol. 2024, 9, 100573. [Google Scholar] [CrossRef]
About|Plant Production and Protection|Food and Agriculture Organization of the United Nations. Available online: https://www.fao.org/plant-production-protection/about/en?utm_source=chatgpt.com (accessed on 24 February 2026).
Gulzar, Y. Applications of Transfer Learning in Sunflower Disease Detection: Advances, Challenges, and Future Directions. Turk. J. Biol. 2025, 49, 534–549. [Google Scholar] [CrossRef] [PubMed]
Guenthner, J.F.; Michael, K.C.; Nolte, P. The Economic Impact of Potato Late Blight on US Growers. Potato Res. 2001, 44, 121–125. [Google Scholar] [CrossRef]
Adolf, B.; Andrade-Piedra, J.; Bittara Molina, F.; Przetakiewicz, J.; Hausladen, H.; Kromann, P.; Lees, A.; Lindqvist-Kreuze, H.; Perez, W.; Secor, G.A. Fungal, Oomycete, and Plasmodiophorid Diseases of Potato. In The Potato Crop: Its Agricultural, Nutritional and Social Contribution to Humankind; Campos, H., Ortiz, O., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 307–350. ISBN 978-3-030-28683-5. [Google Scholar]
Nawaz, M.; Nazir, T.; Javed, A.; Masood, M.; Rashid, J.; Kim, J.; Hussain, A. A Robust Deep Learning Approach for Tomato Plant Leaf Disease Localization and Classification. Sci. Rep. 2022, 12, 18568. [Google Scholar] [CrossRef] [PubMed]
Yan, J.; Wu, H.; Diao, Z.; Miao, Y.; Zhang, B.; Zhao, C. Recent Developments and Applications of Crop Disease Detection, Prediction, and Early Warning: A Review. Engineering 2025, in press. [Google Scholar] [CrossRef]
Jafar, A.; Bibi, N.; Naqvi, R.A.; Sadeghi-Niaraki, A.; Jeong, D. Revolutionizing Agriculture with Artificial Intelligence: Plant Disease Detection Methods, Applications, and Their Limitations. Front. Plant Sci. 2024, 15, 1356260. [Google Scholar] [CrossRef]
Upadhyay, A.; Chandel, N.S.; Singh, K.P.; Chakraborty, S.K.; Nandede, B.M.; Kumar, M.; Subeesh, A.; Upendar, K.; Salem, A.; Elbeltagi, A. Deep Learning and Computer Vision in Plant Disease Detection: A Comprehensive Review of Techniques, Models, and Trends in Precision Agriculture. Artif. Intell. Rev. 2025, 58, 92. [Google Scholar] [CrossRef]
Ngugi, L.C.; Abelwahab, M.; Abo-Zahhad, M. Recent Advances in Image Processing Techniques for Automated Leaf Pest and Disease Recognition—A Review. Inf. Process. Agric. 2021, 8, 27–51. [Google Scholar] [CrossRef]
Ouhami, M.; Hafiane, A.; Es-Saady, Y.; El Hajji, M.; Canals, R. Computer Vision, IoT and Data Fusion for Crop Disease Detection Using Machine Learning: A Survey and Ongoing Research. Remote Sens. 2021, 13, 2486. [Google Scholar] [CrossRef]
Shafay, M.; Hassan, T.; Owais, M.; Hussain, I.; Khawaja, S.G.; Seneviratne, L.; Werghi, N. Recent Advances in Plant Disease Detection: Challenges and Opportunities. Plant Methods 2025, 21, 140. [Google Scholar] [CrossRef]
Bhargava, A.; Shukla, A.; Goswami, O.P.; Alsharif, M.H.; Uthansakul, P.; Uthansakul, M. Plant Leaf Disease Detection, Classification, and Diagnosis Using Computer Vision and Artificial Intelligence: A Review. IEEE Access 2024, 12, 37443–37469. [Google Scholar] [CrossRef]
Sharma, N.; Sharma, P.; Kumar, N. Feature Engineering to Early Detection of Plant Disease Using Image Processing and Artificial Intelligence: A Comparative Analysis. Int. J. Latest Technol. Eng. Manag. Appl. Sci. 2025, 14, 1107–1113. [Google Scholar] [CrossRef]
Qadri, S.A.A.; Huang, N.-F.; Wani, T.M.; Bhat, S.A. Advances and Challenges in Computer Vision for Image-Based Plant Disease Detection: A Comprehensive Survey of Machine and Deep Learning Approaches. IEEE Trans. Autom. Sci. Eng. 2025, 22, 2639–2670. [Google Scholar] [CrossRef]
Hu, Y.; Li, H.; Yang, C.; Chen, N.; Pan, Z.; Ke, W. Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review. Mathematics 2026, 14, 422. [Google Scholar] [CrossRef]
Alhammad, S.M.; Khafaga, D.S.; El-hady, W.M.; Samy, F.M.; Hosny, K.M. Deep Learning and Explainable AI for Classification of Potato Leaf Diseases. Front. Artif. Intell. 2025, 7, 1449329. [Google Scholar] [CrossRef]
Sarawagi, K.; Dhiman, H.; Pagrotra, A.; Talwandi, N.S. Deep Learning for Early Disease Detection: A CNN Approach to Classify Potato, Tomato, and Pepper Leaf Diseases. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT); IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
Abbas, A.; Jain, S.; Gour, M.; Vankudothu, S. Tomato Plant Disease Detection Using Transfer Learning with C-GAN Synthetic Images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar] [CrossRef]
Sharma, J.; Al-Huqail, A.A.; Almogren, A.; Doshi, H.; Jayaprakash, B.; Bharathi, B.; Ur Rehman, A.; Hussen, S. Deep Learning Based Ensemble Model for Accurate Tomato Leaf Disease Classification by Leveraging ResNet50 and MobileNetV2 Architectures. Sci. Rep. 2025, 15, 13904. [Google Scholar] [CrossRef]
Kumar, S.; Sharma, Y.K.; Kumar, M.; Lilhore, U.K.; Aldossary, S.M.A.; Simaiya, S.; Khan, M.M.; Hussien, S.A.; Ghith, E.S. A Hybrid Deep Learning and Fuzzy Logic Framework for Robust Tomato Disease Detection and Classification. Sci. Rep. 2026, 16, 7002. [Google Scholar] [CrossRef]
Osmenaj, Z.; Tseliki, E.-M.; Kapellaki, S.H.; Tselikis, G.; Tselikas, N.D. From Pixels to Diagnosis: Implementing and Evaluating a CNN Model for Tomato Leaf Disease Detection. Information 2025, 16, 231. [Google Scholar] [CrossRef]
Huan, X.; Chen, B.; Zhou, H. A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images. Electronics 2025, 14, 3410. [Google Scholar] [CrossRef]
Babu, T.; Nair, R.R.; Balusamy, B.; Khoh, W.H.; Nair, J. AgriFewNet—A Lightweight RGB Few-Shot Learning Model for Efficient Plant Disease Classification. Appl. Sci. 2025, 15, 12787. [Google Scholar] [CrossRef]
Bektas, Y. FytoSol, a Promising Plant Defense Elicitor, Controls Early Blight (Alternaria solani) Disease in the Tomato by Inducing Host Resistance-Associated Gene Expression. Horticulturae 2022, 8, 484. [Google Scholar] [CrossRef]
Safonova, A.; Stiller, S.; Yordanov, M.; Ryo, M. Self-Supervised Learning Outperforms Supervised Learning for Crop Classification by Annotating Only 5% of Images. Precis. Agric. 2025, 27, 4. [Google Scholar] [CrossRef]
Mulugeta, A.T.; Jifara, W.; Bogale, E.; Desiyo, T.; Mokonnen, A. Early Detection and Classification of Potato Leaf Disease Using Convolutional Neural Networks. Appl. Comput. Intell. Soft Comput. 2025, 2025, 7614841. [Google Scholar] [CrossRef]
Sinshaw, N.T.; Assefa, B.G.; Mohapatra, S.K. Transfer Learning and Data Augmentation Based CNN Model for Potato Late Blight Disease Detection. In Proceedings of the 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA); IEEE: New York, NY, USA, 2021; pp. 30–35. [Google Scholar]
Ahmad, I.; Hamid, M.; Yousaf, S.; Shah, S.T.; Ahmad, M.O. Optimizing Pretrained Convolutional Neural Networks for Tomato Leaf Disease Detection. Complexity 2020, 2020, 8812019. [Google Scholar] [CrossRef]
Cristea, A.-M.; Dobre, C. Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model. AgriEngineering 2025, 7, 432. [Google Scholar] [CrossRef]
Wu, Y.; Liu, J. Fine-Grained Identification of Greenhouse Crop Leaf Diseases Based on Reconstruction-Generation Network. PLoS ONE 2026, 21, e0343228. [Google Scholar] [CrossRef]
Kabya, N.D.; Sharafat, M.D.S.; Emu, R.I.; Opee, M.K.; Khan, R. Towards Practical AI for Agriculture: A Self-Supervised Attention Framework for Spinach Leaf Disease Detection. PLoS ONE 2026, 21, e0340989. [Google Scholar] [CrossRef]
Maqsood, A.; Wu, H.; Kamran, M.; Altaf, H.; Mustafa, A.; Ahmar, S.; Hong, N.T.T.; Tariq, K.; He, Q.; Chen, J.-T. Variations in Growth, Physiology, and Antioxidative Defense Responses of Two Tomato (Solanum lycopersicum L.) Cultivars after Co-Infection of Fusarium oxysporum and Meloidogyne incognita. Agronomy 2020, 10, 159. [Google Scholar] [CrossRef]
Zhao, L.; Cheng, H.; Liu, H.-F.; Gao, G.-Y.; Zhang, Y.; Li, Z.-N.; Deng, J.-X. Pathogenicity and Diversity of Large-Spored Alternaria Associated with Three Solanaceous Vegetables (Solanum tuberosum, S. lycopersicum and S. melongena) in China. Plant Pathol. 2023, 72, 376–391. [Google Scholar] [CrossRef]
Volynchikova, E.; Kim, K.D. Biological Control of Oomycete Soilborne Diseases Caused by Phytophthora capsici, Phytophthora infestans, and Phytophthora nicotianae in Solanaceous Crops. Mycobiology 2022, 50, 269–293. [Google Scholar] [CrossRef] [PubMed]
Mugao, L. Morphological and Molecular Variability of Alternaria solani and Phytophthora infestans Causing Tomato Blights. Int. J. Microbiol. 2023, 2023, 8951351. [Google Scholar] [CrossRef]
Data for: Identification of Plant Leaf Diseases Using a 9-Layer Deep Convolutional Neural Network—Mendeley Data. Available online: https://data.mendeley.com/datasets/tywbtsjrjv/1 (accessed on 24 February 2026).
Keras: Deep Learning for Humans. Available online: https://keras.io/ (accessed on 24 February 2026).
TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 24 February 2026).
Li, J.; Wang, W. Deployment and Application of Deep Learning Models under Computational Constraints. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData); IEEE: New York, NY, USA, 2023; pp. 2529–2533. [Google Scholar]
Gulzar, Y. Fruit Image Classification Model Based on MobileNetV2 with Deep Transfer Learning Technique. Sustainability 2023, 15, 1906. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 4510–4520. [Google Scholar]
Lee, E.; Kim, Y.; Kim, K.; Shin, D.; Jang, S.-J. Designing an Optimized Processing Elements Structure for Depthwise Separable Convolution. In Proceedings of the 2025 International Conference on Electronics, Information, and Communication (ICEIC); IEEE: New York, NY, USA, 2025; pp. 1–3. [Google Scholar]
Lee, S.H.; Chan, C.S.; Mayo, S.J.; Remagnino, P. How Deep Learning Extracts and Learns Leaf Features for Plant Classification. Pattern Recognit. 2017, 71, 1–13. [Google Scholar] [CrossRef]
Barbedo, J.G.A. A Review on the Main Challenges in Automatic Plant Disease Identification Based on Visible Range Images. Biosyst. Eng. 2016, 144, 52–60. [Google Scholar] [CrossRef]
Tariq, M.H.; Sultan, H.; Akram, R.; Kim, S.G.; Kim, J.S.; Usman, M.; Gondal, H.A.H.; Seo, J.; Lee, Y.H.; Park, K.R. Estimation of Fractal Dimensions and Classification of Plant Disease with Complex Backgrounds. Fractal Fract. 2025, 9, 315. [Google Scholar] [CrossRef]
Nagasubramanian, K.; Singh, A.; Singh, A.; Sarkar, S.; Ganapathysubramanian, B. Plant Phenotyping with Limited Annotation: Doing More with Less. Plant Phenome J. 2022, 5, e20051. [Google Scholar] [CrossRef]
Keithellakpam, L.B.; Karunakaran, C.; Singh, C.B.; Jayas, D.S.; Danielski, R. A Comprehensive Review on Pre- and Post-Harvest Perspectives of Potato Quality and Non-Destructive Assessment Approaches. Appl. Sci. 2025, 16, 190. [Google Scholar] [CrossRef]
Chen, Y.; Liu, W. CBSNet: An Effective Method for Potato Leaf Disease Classification. Plants 2025, 14, 632. [Google Scholar] [CrossRef]
Bhuyan, A.S.; Thakur, M.; Farooq, S.A.; Kaur, S.; Suryakanta; Kuma, S. Automated Deep Learning System for Early Blight Disease Identification in Tomatoes. In Proceedings of the 2025 2nd International Conference on Computational Intelligence and Computing Applications (ICCICA); IEEE: New York, NY, USA, 2025; pp. 561–566. [Google Scholar]
Nishankar, S.; Mithuran, T.; Thuseethan, S.; Sebastian, Y.; Yeo, K.C.; Shanmugam, B. TOM-SSL: Tomato Disease Recognition Using Pseudo-Labelling-Based Semi-Supervised Learning. AgriEngineering 2025, 7, 248. [Google Scholar] [CrossRef]
Schmey, T.; Tominello-Ramirez, C.S.; Brune, C.; Stam, R. Alternaria Diseases on Potato and Tomato. Mol. Plant Pathol. 2024, 25, e13435. [Google Scholar] [CrossRef]
Radočaj, P.; Radočaj, D.; Martinović, G. Image-Based Leaf Disease Recognition Using Transfer Deep Learning with a Novel Versatile Optimization Module. Big Data Cogn. Comput. 2024, 8, 52. [Google Scholar] [CrossRef]
Nawaz, M.; Javed, A.; Saudagar, A.K.J. PotatoGuardNet: A Refined Deep Learning Framework for Potato Leaf Disease Detection. Front. Plant Sci. 2026, 17, 1720276. [Google Scholar] [CrossRef] [PubMed]
Zhu, X.; Mao, T.; Chen, J.; Peng, F.; Li, K. Attention-Based and Context-Aware Knowledge Distillation for Enhancing Crop Disease Detection. Appl. Soft Comput. 2026, 195, 115017. [Google Scholar] [CrossRef]
Afzaal, H.; Farooque, A.A.; Schumann, A.W.; Hussain, N.; McKenzie-Gopsill, A.; Esau, T.; Abbas, F.; Acharya, B. Detection of a Potato Disease (Early Blight) Using Artificial Intelligence. Remote Sens. 2021, 13, 411. [Google Scholar] [CrossRef]

Figure 1. Global yield trends for potato and tomato crops from 2000 to 2024 based on FAOSTAT data.

Figure 2. Overview of the experimental workflow, showing the multi-stage research design: initial self-supervised pretraining (ID 0) on the full PlantVillage dataset, followed by supervised fine-tuning tasks (ID 1–4) for cross-crop and multi-class disease classification using the MobileNetV2 architecture.

Figure 3. Examples of healthy leaves and leaves showing symptoms of Early Blight and Late Blight on potato and tomato plants from the PlantVillage dataset.

Figure 4. Convergence behavior of the MobileNetV2 backbone during the self-supervised pretraining phase (ID 0), illustrating the decrease in average training loss over 30 epochs.

Figure 5. Confusion matrices for Experiments 1–4, illustrating the classification performance of the MobileNetV2 model across different pretraining settings and host–pathogen combinations.

Figure 6. Visual explanation of model predictions using heatmap and Grad-CAM visualizations across Healthy, Early Blight, and Late Blight classes for both potatoes and tomatoes.

Table 1. Distribution of images across the experimental stages and dataset splits.

Experiment ID	Total Images Used	Images Used During Modeling
Experiment ID	Total Images Used	Training Dataset (70%)	Validation Dataset (15%)	Test Dataset (15%)
0	54,303	/	/	/
1	6651	4655	998	998
2a	4499	3149	675	675
2b	2152	1506	323	323
3a	2000	1400	300	300
3b	2908	2036	436	436
4	4908	3436	736	736

Table 2. Performance metrics for supervised tasks (ID 1–4), comparing SSL pretraining with ImageNet initialization across crop–pathogen combinations.

Experiment ID	Accuracy	Precision		Recall		F1-Score
Experiment ID	Accuracy	Macro	Weighted	Macro	Weighted	Macro	Weighted
1 (SSL)	0.9158	0.9396	0.9261	0.8793	0.9158	0.9013	0.9143
1 (ImageNet)	0.8577	0.8871	0.8860	0.8418	0.8577	0.8392	0.8478
2a	0.9141	0.9002	0.9168	0.9100	0.9141	0.9040	0.9144
2b	0.7245	0.8541	0.7991	0.6759	0.7245	0.7068	0.7062
3a	0.9600	0.9634	0.9629	0.9595	0.9600	0.9599	0.9599
3b	0.9359	0.9209	0.9449	0.9498	0.9359	0.9313	0.9370
4	0.8616	0.9082	0.8824	0.8368	0.8616	0.8572	0.8567

Table 3. Detailed classification metrics per experiment ID, crop type, and leaf status.

Experiment ID	Crop	Leaf Status	Precision	Recall	F1-Score	No. of Samples per Class
1 (SSL)	Tomato	H	1.0000	1.0000	1.0000	231
		EB	0.9706	0.8684	0.9167	152
		LB	0.8473	0.9899	0.9130	297
	Potato	H	1.0000	0.7778	0.8750	27
		EB	1.0000	0.7152	0.8339	158
		LB	0.8200	0.9248	0.8693	133
1 (ImageNet)	Tomato	H	0.9130	1.0000	0.9545	231
		EB	1.0000	0.6447	0.7840	152
		LB	0.8409	0.9966	0.9122	297
	Potato	H	0.8966	0.9630	0.9286	27
		EB	1.0000	0.5063	0.6723	158
		LB	0.6720	0.9398	0.7837	133
2a	Tomato	H	0.9765	1.0000	0.9881	249
		EB	0.7917	0.8693	0.8287	153
		LB	0.9325	0.8608	0.8952	273
2b	Potato	H	1.0000	0.5517	0.7111	29
		EB	0.6422	1.0000	0.7822	149
		LB	0.9200	0.4759	0.6273	145
3a	Tomato	EB	1.0000	0.9189	0.9577	148
3a	Potato	EB	0.9268	1.0000	0.9620	152
3b	Tomato	LB	0.9962	0.9062	0.9491	288
3b	Potato	LB	0.8457	0.9933	0.9136	149
4	Tomato	EB	0.8618	0.7910	0.8249	134
	Tomato	LB	0.7784	0.9736	0.8651	303
	Potato	EB	0.9928	0.9787	0.9857	141
	Potato	LB	1.0000	0.6038	0.7529	159

H: Healthy, EB: Early Blight, LB: Late Blight.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Radočaj, P.; Jurišić, M.; Radočaj, D. Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition. Agriculture 2026, 16, 716. https://doi.org/10.3390/agriculture16070716

AMA Style

Radočaj P, Jurišić M, Radočaj D. Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition. Agriculture. 2026; 16(7):716. https://doi.org/10.3390/agriculture16070716

Chicago/Turabian Style

Radočaj, Petra, Mladen Jurišić, and Dorijan Radočaj. 2026. "Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition" Agriculture 16, no. 7: 716. https://doi.org/10.3390/agriculture16070716

APA Style

Radočaj, P., Jurišić, M., & Radočaj, D. (2026). Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition. Agriculture, 16(7), 716. https://doi.org/10.3390/agriculture16070716

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Domain-Specific Self-Supervised Pretraining for Low-Resource Multi-Crop Plant Disease Recognition

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Dataset Characteristics and Preparation

3.2. MobileNetV2 Architecture and Self-Supervised Learning Approach

3.3. Validation Strategy and Performance Metrics

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI