Abstract
The rapid growth of artificial intelligence (AI) has enabled efficient crop disease detection even in data-scarce agricultural settings. This study proposes AgriFewNet, a few-shot learning framework designed to improve classification accuracy using RGB imagery captured from publicly available datasets. The objective is to enable fast model adaptation to new disease classes using minimal labeled samples while maintaining high reliability in real-world conditions. AgriFewNet employs a hierarchical attention-enhanced ResNet-18 backbone incorporating dual spatial and channel attention to extract discriminative RGB features. A Model-Agnostic Meta-Learning (MAML) approach facilitates quick generalization to previously unexplored illness categories, while a prototype-based classifier guarantees compact representation learning. Using only RGB images, experiments on the PlantVillage and New PlantVillage datasets produced accuracies of 87.3% (1-shot), 94.8% (5-shot), and 97.1% (10-shot), surpassing leading few-shot baselines by as much as 7.9%. The findings show that AgriFewNet offers a resource-efficient and scalable method for intelligent crop monitoring, enhancing food security and precision agriculture.
1. Introduction
Plant diseases constitute a major global challenge, contributing up to 40% yield loss across several high-value crops and threatening food security in climate-sensitive regions [1]. Early, reliable, and scalable disease diagnosis is essential for minimizing pesticide misuse, guiding precision interventions, and supporting sustainable farming practices. With the rapid evolution of computer vision and artificial intelligence (AI), deep convolutional neural networks (CNNs) have emerged as powerful tools for automating disease detection from leaf images, offering advantages such as rapid inference, non-destructive assessment, and consistent performance compared to manual scouting [2,3,4]. Machine learning has proven highly successful across scientific domains involving complex classification problems such as hazard identification of near-Earth objects using Random Forest algorithms [5].
Publicly available datasets such as PlantVillage have played a foundational role in supporting model development by enabling the training and benchmarking of deep learning models under controlled imaging conditions [6]. These datasets have enabled significant progress in plant phenotyping and disease recognition. Nevertheless, obtaining extensive, labeled image sets for each crop–disease combination is still impractical in real-world agricultural settings. The efficacy of traditional supervised learning techniques is sometimes hampered by significant data scarcity in rare diseases, early-stage symptoms, and region-specific crops. The creation of few-shot learning (FSL) models that can generalize from a small number of data per class is motivated by this.
1.1. Current State of Research
Over the past ten years, deep learning-based plant disease detection has made significant progress [7]. CNN architectures that have proven to perform well on large-scale datasets include ResNet, EfficientNet, and MobileNet [7], and transfer learning techniques have further enhanced generalization in the absence of substantial domain-specific data [8]. By highlighting discriminative visual regions and reducing background noise, attention-based models such as CBAM and SE-Net variants have improved disease localization [9].
Meta-learning and few-shot learning frameworks have been popular recently for agricultural applications with limited annotated data. Metric-learning techniques like Prototypical Networks [10], Relation Networks [11], and fast-adaptation feature spaces that have been successfully learned by Matching Networks and optimization-based methods like the Model-Agnostic Meta-Learning (MAML) algorithm [12] enable rapid fine-tuning using only a few gradient steps. Few-shot learning has been explored in emerging agricultural studies for crop category identification, stress detection, and disease classification [13]. However, the majority of studies are still restricted to standard datasets and do not adequately address real-world deployment limitations. According to recent research, Tensor Processing Units (TPUs) greatly speed up deep learning pipelines for large-scale image analysis applications, such as satellite land classification, allowing for better throughput and faster convergence [14].
While multimodal sensing such as hyperspectral, NIR, or thermal imaging is often suggested as a possible avenue to improve plant disease detection [15], due to its high sensor cost, enormous data dimensionality, and restricted public access, these modalities are still mostly limited to research labs. As a result, RGB imaging is mostly used in real-world agricultural systems due to its low cost, ease of large-scale deployment, and interoperability with cellphones and drones.
1.2. Limitations and Objectives of the Study
Three significant obstacles still exist in the current research on agricultural disease classification despite significant advancements:
- Dependency on huge datasets: For rare diseases or newly developing pathogen outbreaks, standard CNN and transfer learning frameworks require hundreds of tagged images per disease class.
- Limited generalization in cross-domain or data-scarce environments: Models trained on controlled datasets frequently are unable to adjust to changes in crop varieties, environmental conditions, or region-specific imaging features.
- Lack of lightweight, deployable architectures: Many existing models prioritize accuracy at the expense of computational efficiency, hindering adoption on edge devices such as handheld tools, drones, or field robots.
Furthermore, although multimodal imaging is often discussed as a future enhancement, RGB-only systems remain the most realistic for wide-scale agricultural deployment. Hyperspectral and thermal sensors were not used in this work due to the following: limited dataset availability for few-shot learning benchmarks, high acquisition cost and calibration requirements [16], unsuitability for low-resource farming contexts, and extremely high feature dimensionality in hyperspectral data that conflicts with the goal of building a lightweight, fast-adapting model [17].
Given these challenges, this study introduces AgriFewNet, a data-efficient plant disease classification framework designed explicitly for few-shot agricultural scenarios using only RGB imagery.
1.3. Objectives of the Study
The following are the main goals of this work:
- To provide an improved representation of agricultural RGB imagery using a hierarchical attention-based feature extraction network.
- To design a classification model that is driven by an adaptive prototype and optimized for few-shot agricultural learning tasks.
- To employ a meta-learning approach based on MAML for effective adaptation in cross-domain and data-limited instances.
- To validate performance in 1-, 5-, and 10-shot scenarios in order to show real applicability, scalability, and robustness.
Theproposed AgriFewNet architecture seeks to overcome these constraints in order to provide a lightweight, flexible, and field-ready solution that closes the performance gap between deep learning systems taught in laboratories and the reality of agricultural deployment.
The remainder of this paper is organized as follows. The materials and techniques are introduced in Section 2, which details the model architecture, training protocols, dataset features, and mathematical formulation. The research is further discussed in Section 5, which also provides a quick review of the contributions and recommends possible applications in precision agriculture. Section 3 deals with results and analysis. Section 4 provides discussion on results and analysis.
2. Materials and Methods
Few-shot learning (FSL) for crop monitoring employing meta-learning techniques was supported by a thorough methodological framework. Deep feature learning, meta-learning, and temporal modeling are all necessary for the rigorous methodological design of the proposed AgriFewNet smart agricultural monitoring system, which can function in data-constrained environments [18]. These characteristics are combined in the suggested AgriFewNet few-shot learning framework to enable quick adaptation to new crop types and disease classes with little supervision. The methodological process comprises data preprocessing, hierarchical feature learning with attention mechanisms, prototype-based classification, meta-learning adaptation, temporal consistency modeling, and multi-objective loss optimization [19,20].
2.1. Materials and Dataset Preparation
The proposed AgriFewNet few-shot learning system was thoroughly tested on both the original PlantVillage dataset, which contained 54,303 photos distributed over 38 classes, and the expanded New PlantVillage dataset, which included improved annotations. The experimental framework was implemented on NVIDIA RTX 3090 GPUs with 24 GB of RAM using PyTorch 1.12. To assess the model’s ability to generalize across novel agricultural contexts, the dataset was split into three sets: meta-training (23 classes, 60.5%), meta-validation (8 classes, 21.1%), and meta-testing (7 classes, 18.4%). The selected ratios for meta-training (60.5%), meta-validation (21.1%), and meta-testing (18.4%) are in line with common practices in few-shot meta-learning frameworks like MAML and Prototypical Networks, where a higher percentage of classes are needed for meta-training in order to produce a variety of episodic tasks. While meta-testing is kept under 15–20% to guarantee enough unseen classes for trustworthy generalization, meta-validation needs a moderate fraction to adjust adaptation behavior without overfitting. The chosen ratio was empirically supported by preliminary testing with different splits, which revealed less consistent performance and less steady convergence.
Over 10,000 training sessions were conducted, with 15 example questions for each class in each episode. With a batch size of 32, the meta-learning system employed learning rates of and , respectively. To improve model robustness, random rotation (), color jittering (brightness and contrast ), horizontal and vertical flips, Gaussian noise (), and mixup augmentation () were used systematically. The augmentation parameters were selected based on both prior studies in agricultural image augmentation and preliminary sensitivity analysis conducted in our experiments. The rotation range (±15°) aligns with established works on plant disease detection that simulate natural leaf orientation changes without geometric distortions. Color jittering values (brightness/contrast ±0.2) were adopted from widely used augmentation settings shown to preserve symptom visibility while improving robustness to lighting variations. We evaluated broader ranges (±30° rotation and ±0.4 jitter), but these caused unnatural distortions or reduced classification stability, supporting the chosen values.
2.2. Methods
The agricultural monitoring problem is framed as an N-way K-shot classification task under the few-shot learning (FSL) paradigm. Each meta-learning episode (or task) consists of a support set and a query set. The support set(S) contains the few labeled samples used for learning or fine-tuning the model parameters for a specific task. The support set is as follows:
is provided with K examples per class where N denotes the number of distinct classes (N-way) within a meta-task. K represents the number of labeled examples per class (K-shot) used for adaptation. is an input image where H, W, and C denote height, width, and number of channels, respectively. is the corresponding class label for .
The query set (Q) contains unseen samples from the same task that are used to evaluate task-specific generalization. The query set Q is as follows:
Each task is sampled from a task distribution that defines the variability of agricultural conditions such as crop type, disease, or season. The model parameters are optimized to minimize the expected loss over all tasks:
where denotes the meta-learning objective function that measures the expected error across tasks. is the parameterized model (e.g., the proposed attention-based ResNet-18 feature extractor with meta-learning). represents the loss function (typically cross-entropy) computed between the predicted and ground-truth labels. and are samples and labels drawn from the query set Q. indicates averaging across different tasks to ensure generalization to unseen agricultural scenarios.
ResNet-18 was chosen as the feature extraction backbone to ensure a balance between representational capacity and computational efficiency required for few-shot adaptation. We conducted preliminary comparisons using a deeper model (ResNet-34) and a lighter model (MobileNetV2). ResNet-34 offered marginal accuracy improvement (+0.4%) but incurred a 2.1× increase in parameters and a 37% slower adaptation speed, which negatively affects meta-learning efficiency. In contrast, MobileNetV2 reduced parameter count by 31% but led to a drop of −2.6% in 5-shot accuracy due to its limited ability to capture fine-grained disease features. Considering these findings, ResNet-18 provided the best trade-off between adaptation speed, computational load, and discriminative feature learning in agricultural few-shot scenarios.
2.3. Feature Extraction Network
The feature extraction module is designed to learn discriminative and robust features. A ResNet-18 backbone with a modified version incorporating dual attention mechanisms [21], and channel and spatial attention, is utilized to extract global context and localized disease-specific information.
Let denote the intermediate feature map obtained from a convolutional block of the backbone network where H and W represent the spatial dimensions (height and width) and C denotes the number of feature channels.
The overall attention-enhanced feature extraction process consists of two sequential modules: spatial attention and channel attention.
- (a)
- Spatial Attention Module
Spatial attention focuses on the feature representation by highlighting disease-relevant regions (e.g., lesions or infected spots) within the image. It computes an attention mask based on both average-pooled and max-pooled features along the channel dimension:
where and aggregate spatial information across channels using average and maximum pooling, respectively. denotes channel-wise concatenation, is a convolutional layer with kernel size to enlarge the receptive field and is the sigmoid activation function that normalizes attention scores to the range .
The resulting attention map captures spatially significant regions in the input feature space.
- (b)
- Channel Attention Module
Channel attention focuses on the adaptive weighting feature channels according to their relevance for disease recognition. This is implemented using a multi-layer perceptron (MLP) applied to global average and max-pooled features:
where capture global channel-wise statistics.
The consists of two fully connected layers with a reduction ratio r (typically ) to reduce parameter overhead:
with denoting the ReLU activation, and are learnable weight matrices and ensures the attention weights are normalized between 0 and 1.
The reduction ratio r in the channel attention module controls the dimensionality bottleneck of the MLP. To determine the optimal value, we evaluated r ∈ {4, 8, 16, 32}. Lower r values (e.g., 4 or 8) improved expressive power but increased parameters by 18–35%, leading to slower adaptation in meta-training. Larger r values (e.g., 32) reduced parameters but caused underfitting and a 1.9% drop in accuracy. The choice of r = 16 provided an optimal balance, achieving the best 5-shot accuracy while maintaining low computational overhead.
- (c)
- Combined Attention-Enhanced Features
The most recently improved feature map incorporates both spatial and channel attention via element-wise multiplication:
where ⊙ is product of the Hadamard (element-wise).
Equation (7) trains the model to learn both spatial saliency (disease areas) and channel-wise relevance. This dual attention method efficiently reduces irrelevant background noise while improving discriminative regions associated with crop diseases.
- (d)
- Summary of Feature Flow
The general feature extraction procedure can be represented as follows:
where I is the input image. The spatially attended feature is as follows:
where is the final feature embedding and is transferred to the prototype-based categorization stage.
This hierarchical attention-guided feature extraction guarantees strong representation learning, allowing for accurate few-shot illness recognition under a variety of lighting, occlusion, and environmental noise situations.
2.4. Prototype-Based Classification
Few-shot learning (FSL) uses a single prototype vector to represent each class and its support examples in the learned embedding space [22]. This metric-based approach enables classification by computing the distance between a query sample and these prototype vectors.
Let the embedding function transform raw input x into a latent vector. Let the embedding function be a neural network that maps an input image x to a d-dimensional feature vector:
where H, W, and C denote the image height, width, and number of channels, respectively, and d represents the dimension of the learned feature space.
For each class , its prototype vector is computed as the mean of the embedded support examples belonging to that class:
where represents the centroid in feature space and denotes the support set of class k containing K-labeled examples . The prototype thus acts as a representative feature centroid capturing the key characteristics of class k.
For a given query image , the similarity to each class is determined by computing the squared Euclidean distance between its embedding and each prototype :
To obtain class probabilities, a softmax function is applied over the negative distances:
where is the predicted probability that the query image belongs to class k (Equation (13)).
The predicted class label for the query image is then given by the class with the highest posterior probability:
This prototype-based approach provides a simple yet effective mechanism for few-shot classification. By comparing a query embedding with precomputed class prototypes, the model achieves efficient inference while maintaining high discriminative power, even in data-scarce agricultural conditions.
2.5. Meta-Learning Adaptation
The meta-learning component aims to enable rapid adaptation of the model parameters to new agricultural tasks with minimal labeled data [23]. We adopt the Model-Agnostic Meta-Learning (MAML) framework, which learns an initialization of that can be fine-tuned efficiently on a small support set for a new task [24].
- Task Definition:
Let denote a sampled task from the distribution of agricultural tasks . Each task consists of a support set (used for adaptation) and a query set (used for evaluation).
- Inner-loop Adaptation:
For each task , the model performs one or more gradient-based updates using its support set . The adapted task-specific parameters are computed as follows:
where is the global model parameters shared across tasks, is the inner learning rate controlling the magnitude of adaptation. is the loss function computed on support examples and is the neural network parameterized by .
This step enables the model to specialize its parameters to the task by performing one or more gradient descent steps.
- Outer-loop Meta-update:
After adaptation, the model’s performance is evaluated on the query set to compute the meta-objective:
where B is the number of tasks sampled per meta-batch. The global model parameters are updated by minimizing this meta-loss:
where is the meta learning rate governing the speed of meta-optimization, is the gradient of the query loss with respect to the initial parameters before adaptation, and is the overall meta-learning loss aggregating task-level feedback.
- Overall Objective:
The optimization alternates between the inner and outer updates to minimize the expected query loss across all tasks sampled from :
where is obtained via Equation (15). This process ensures that becomes a meta-initialization capable of rapid adaptation to unseen agricultural tasks with only a few gradient steps [25].
Together, Equations (15)–(18) define the bi-level optimization mechanism enabling the model to generalize efficiently across new agricultural monitoring tasks with minimal supervision.
- Task Sampling Strategy for Episodic Meta-Learning:
To ensure reproducible meta-training, we explicitly define the task sampling procedure used to construct each episodic N-way K-shot task. Let denote the set of available training classes. For every episode, we first randomly sample N distinct classes from using uniform sampling without replacement. For each selected class c, we randomly select K samples to form the support set, ensuring that no image appears in both support and query partitions. Additionally, for each class, we sample Q distinct images to construct the query set, where Q = 15 in our experiments. All samples are drawn uniformly and shuffled to prevent ordering effects. Thus, each task consists of the following:
.
During meta-training, all tasks are drawn from (23 classes). During meta-validation (8 classes) and meta-testing (7 classes), task sampling follows the same procedure but uses disjointed class partitions to measure cross-class generalization. This episodic sampling strategy ensures task diversity, balanced class representation, and alignment with standard few-shot learning benchmarks such as miniImageNet and PlantVillage-based FSL setups.
2.6. RGB-Based Embedding and Feature Utilization Strategy
Since the experiments in this study rely exclusively on RGB imagery, AgriFewNet employs a single-stream feature extraction pipeline [26]. The hierarchical attention-enhanced ResNet-18 backbone generates discriminative RGB embeddings that serve as input to the prototype-based classifier during meta-learning.
Let denote the feature maps extracted from the RGB encoder after applying the spatial and channel attention modules described in Section 2.3. These embeddings capture texture, color variation, lesion patterns, and shape characteristics relevant to plant disease identification.
The feature vector used for prototype construction is obtained by global average pooling:
where denotes the compact RGB embedding.
During each meta-learning episode, the support-set embeddings are used to compute class prototypes following Equation (11), while query embeddings are used during optimization following Equations (12)–(14). This RGB-only design ensures computational efficiency, reduces memory overhead, and aligns directly with publicly available agricultural datasets, which predominantly provide RGB images.
2.7. Training Algorithm
The learning system as a whole is optimized by an episodic meta-training procedure in accordance with Model-Agnostic Meta-Learning (MAML). Every training episode mimics a few-shot learning task sampled from the task distribution , consisting of a small support set for adaptation and a query set for testing. For each episode, the inner loop first updates the model parameters using the support samples to receive task-specific parameters that allow fast adaptation to novel agricultural conditions [27]. Later on, in the outer loop, the meta-learner pools information from several tasks to improve the initialization so that the model generalizes well across different unseen crop varieties and disease conditions. This two-level optimization prompts the network [28] to learn transferable knowledge that demands few gradient updates on encountering new tasks and substantially accelerates retraining time and labeled data demands. The entire training and adaptation processes are encapsulated within Algorithm 1.
| Algorithm 1 Few-Shot Agricultural Monitoring Training. |
| Require: Dataset , task distribution , learning rates Ensure: Optimized parameters 1: Initialize randomly 2: for each episode do 3: Sample batch of tasks 4: for each do 5: Sample support set and query set 6: Compute task-adapted parameters: 7: Evaluate query loss: 8: end for 9: Update meta-parameters: 10: end for 11: return |
2.8. Loss Function Design
The overall learning objective is formulated as a weighted [29] combination of multiple complementary loss terms that jointly optimize classification accuracy, temporal coherence, feature discrimination, and model regularization. The total loss is expressed as follows:
where and are non-negative hyperparameters that control the relative importance of each component. Their values are empirically tuned to ensure a balanced optimization process.
where the classification loss, , enforces correct label prediction for each query sample using a standard cross-entropy formulation as follows:
where is the number of query samples in a task, is the query image, is its ground-truth label, and denotes the predicted probability of class . This loss drives the model to assign high probability to the correct class, thus optimizing discriminative classification performance within few-shot scenarios.
The contrastive loss, , enhances the discriminative capacity of the learned feature embeddings by encouraging intra-class compactness and inter-class separability as follows:
where and denote the feature vectors of samples i and j, or if both samples belong to the same class and 0, and m is a positive margin controlling inter-class separation. This loss ensures that samples from the same class are pulled closer in feature space, while samples from different classes are pushed apart by at least the margin m.
Regularization Loss, , prevents overfitting and encourages smoother weight updates; an regularization term is added:
where represents all trainable parameters of the network. This term acts as a weight decay mechanism, reducing large parameter magnitudes that may cause unstable adaptation during meta-learning.
To determine the optimal loss-weight coefficients , we performed a grid-search tuning procedure on the meta-validation split of the PlantVillage dataset. The initial search space was , , , and . Each configuration was evaluated based on the mean 5-shot validation accuracy across 600 meta-validation episodes. The configuration ( = 1.0, = 0.5, = 0.3, = ) achieved the highest stability and accuracy, with minimal variance across runs. Thus, this setting was adopted for all experiments. These values provide a balance between classification accuracy, temporal smoothness, discriminative feature learning, and regularization.
The final optimization problem becomes the following:
where denotes the optimized model parameters learned through the meta-training process.
2.9. Architectural Overview
The architectural design of AgriFewNet is a hybrid of different complementary modules that, in unison, effectively support few-shot detection of agricultural diseases in situations where data is scarce. The architecture as shown in Figure 1 starts with the acquisition of the RGB image, which is the main visual modality for capturing the disease-specific color, texture, and structure of the plant leaves. These RGB features are then coupled with a hierarchical attention-based feature extraction network which is based on a modified ResNet-18 backbone and is, moreover, upgraded with spatial and channel attention mechanisms. This architecture empowers the model to direct the attention selectively to those lesion regions that contain the most information. Thus, the model’s performance is improved, for example, in the presence of changes in illumination, background noise, and even when the disease symptoms are barely visible.
Figure 1.
Proposed few-shot learning architecture for agricultural monitoring.
The proposed dual-attention architecture allows the network to selectively attend to disease-related regions and spectral channels; thus, it silences background noise and illumination variations. After that, the discriminative embeddings obtained are used as the input to a prototype-based classification layer where each crop disease class is symbolized by a centroid (prototype) in the feature space that has been learned. Thus, classification is conducted by distance-based similarity metrics. This agro-smart system includes components for temporal consistency, prototype-based categorization, hierarchical feature extraction, and MAML-based meta-learning adaptation.
AgriFewNet employs a Model-Agnostic Meta-Learning (MAML) strategy that essentially adjusts the model’s starting point for efficient refinement with just a few labeled examples. This enables a rapid adaptation to unseen agricultural tasks. The different components thus form a cohesive system of an end-to-end adaptive learning architecture that is able to reach high accuracy and fast convergence by balancing generalization and specificity.
2.10. Statistical Analysis
To ensure rigorous and reproducible evaluation of the proposed AgriFewNet framework, all experiments were subjected to a comprehensive statistical analysis protocol. Each N-way K-shot experiment was repeated five times with different random seeds, and the mean ± standard deviation was reported for accuracy, precision, recall, F1-score, mAP, and adaptation steps. This repetition accounts for randomness in task sampling and ensures reliable estimation of model stability, particularly in few-shot settings where class distributions vary between episodes.
To determine whether AgriFewNet significantly outperformed baseline few-shot learning methods, pairwise independent t-tests were conducted across repeated trials. Because multiple comparisons were performed (AgriFewNet versus MAML, Prototypical Networks, Relation Networks, Fine-tuning, and Transfer Learning), a Bonferroni correction was applied to control the family-wise error rate. Statistical significance thresholds were set as follows: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***). For convergence behavior and cross-domain adaptation analysis, 95% confidence intervals were computed from the empirical variance of repeated experiments. Additionally, to compare adaptation step requirements across methods, a one-way ANOVA test was employed to quantify whether differences in convergence speed were statistically significant.
All statistical computations, including significance testing, confidence interval estimation, and variance analysis, were implemented using Python 3.13 libraries SciPy, NumPy, and StatsModels. The procedures described above ensure that reported improvements are statistically meaningful, reproducible, and representative of real-world variability in few-shot agricultural classification tasks.
3. Results
This section describes the experiments made to check the effectiveness of the proposed AgriFewNet framework. The results are exposed in a logical manner concerning the cross-domain flexibility, the extension over few-shot learning conditions, and the classification accuracy. Moreover, these evaluations are supported by the temporal stability and ablation experiments that have been quantified. In order to prove the robustness and the supremacy of the proposed AgriFewNet method, the findings are also juxtaposed with state-of-the-art techniques.
3.1. Comparative Performance Analysis
The proposed AgriFewNet framework is assessed using the comparative performance analysis against several well-known few-shot learning baselines, such as the MAML, Prototypical Networks, Relation Networks, Transfer Learning, and Fine-Tuning models. Model scalability, inference efficiency, classification accuracy, and flexibility across various few-shot configurations (1-, 5-, and 10-shot) are among the many factors that are the focus of the evaluation. AgriFewNet consistently performs better than all rival models in all setups, according to the results, showing faster convergence and higher accuracy with less parameter overhead. Meta-learning adaptability and hierarchical attention processes work in concert to improve discriminative feature learning and domain generalization, which is largely responsible for the performance improvements. Furthermore, statistical validation demonstrates the strong and repeatable performance of AgriFewNet across a variety of agricultural datasets, confirming the significance of the reported gains (p < 0.001).
3.1.1. Few-Shot Classification Accuracy
The given AgriFewNet+ protocol performs better in any few-shot learning scenario as seen in Table 1. In the difficult one-shot learning case, the proposed AgriFewNet model attains (87.3 ± 1.2%) accuracy, which is a significant improvement (7.9 percentage points above MAML (79.4 ± 2.1%) and 5.7 percentage points above Prototypical Networks (81.6 ± 1.8%).
Table 1.
Few-shot classification accuracy comparison.
This dramatic improvement has been credited to the synergistic combination of the attention mechanisms, and time consistency modeling.
As the number of support examples increases, the performance gap remains substantial. In five-shot learning, the proposed AgriFewNet method attains 94.8 ± 0.8% accuracy, outperforming MAML by 5.6 percentage points and Prototypical Networks by 4.1 percentage points. The 10-shot scenario yields 97.1 ± 0.6% accuracy, approaching near-optimal classification performance while maintaining computational efficiency with only 11.2 M parameters and 8.7 ms inference time per image.
The meta-training convergence properties of the proposed AgriFewNet model is illustrated in Figure 2. Within the first 2000 episodes, the training loss shows quick initial convergence. Next, it refines gradually, stabilizing at about 0.02 after 8000 episodes. With little fluctuation, the validation accuracy curve shows a consistent improvement from 85% at initialization to a plateau of 96.3% at convergence, suggesting strong learning dynamics and successful generalization to new tasks.
Figure 2.
Meta-training loss convergence (top) and validation accuracy progression (bottom) over 10,000 episodes.
3.1.2. Performance Scaling with Support Examples
A detailed comparison of classification accuracy as a function of support example quantity is shown in Figure 3. The accuracy of the suggested AgriFewNet approach gradually increased from 87.3% (1-shot) to 94.8% (5-shot), 97.1% (10-shot), 98.2% (15-shot), and finally reached 98.5% (20-shot), demonstrating favorable scaling features. A crucial benefit for real-world agricultural surveillance, where large labeled datasets are unaffordable, is that the model appears to be able to capture discriminative features with a small number of samples, as seen by the diminishing returns after ten shots.
Figure 3.
Classification accuracy as a function of support example quantity (K-shot learning).
The performance curves show that the suggested AgriFewNet methodology performs better in all shot configurations, with the biggest difference occurring in low-shot situations (one-shot and five-shot), where conventional transfer learning and fine-tuning techniques falter. Few-shot learning tackles the fundamental problem of quick adaptation, which is highlighted by the baseline fine-tuning method’s 65.2 ± 3.4% in one-shot learning.
3.2. Cross-Domain Adaptation Capabilities
The results of the cross-domain adaptation are shown in Table 2 and show how the model can transfer knowledge between various crop species and disease kinds [30]. The adaptation from tomato illnesses to potato diseases shows good feature sharing among morphologically related crops in the Solanaceae family, achieving 89.4 ± 1.5% accuracy with only gradient steps. The model learns robust representations of healthy baseline conditions, as evidenced by the maximum accuracy of 91.2 ± 1.3% achieved by the transfer from healthy to diseased leaves, which only requires adaptation steps.
Table 2.
Cross-domain adaptation results.
The accuracy of more difficult cross-species transfers, such as those from corn diseases to grape diseases, is still excellent at 86.7 ± 1.8%, but it requires more adaptation steps (). The ability of the model to generalize across many agricultural scenarios while preserving computational efficiency is demonstrated by this. Deployment scenarios where new crop types or emerging illnesses need to be swiftly integrated into monitoring systems without requiring significant retraining benefit greatly from the rapid adaptation capacity.
3.3. Attention Mechanism Effectiveness
Four typical disease cases such as tomato early blight (Figure 4), corn northern leaf blight (Figure 5), apple scab (Figure 6), and potato late blight (Figure 7) are used to illustrate the learned attention patterns. With high activation values (red coloring) that perfectly match lesion boundaries and symptomatic locations, the spatial attention module locates disease-affected regions with success. As seen in Figure 8, the spatial emphasis for tomato early blight is focused on circular necrotic patches with distinctive concentric rings. The focus of maize northern leaf blight is on the long, cigar-shaped lesions that run parallel to the veins of the leaves.
Figure 4.
Visualization of the tomato attention mechanism.
Figure 5.
Visualization of the corn attention mechanism.
Figure 6.
Visualization of the apple attention mechanism.
Figure 7.
Visualization of the potato attention mechanism.
Figure 8.
Detailed representation of the tomato early blight attention system.
By focusing on feature maps that capture disease-relevant texture, color, and morphological patterns, the channel attention module enhances spatial attention. By decreasing background noise and improving contrast between diseased and healthy tissue, the combined attention output exhibits synergistic integration.
Quantitative Analysis of Attention Localization
To complement the qualitative heatmaps, we evaluated attention accuracy using two mask-free metrics: Attention Localization Score (ALS) and Attention Precision (AP). Table 3 shows that AgriFewNet achieves the highest ALS (0.78) and AP (0.74), outperforming the baseline ResNet-18 (0.61 and 0.57). Spatial and channel attention individually improve localization, while the dual-attention configuration yields the most focused disease-region activation. These results confirm that the proposed attention mechanism significantly enhances lesion-aware feature extraction compared to conventional backbones.
Table 3.
Quantitative metrics for attention localization.
3.4. Detailed Performance Metrics
The complete performance metrics that compare the original PlantVillage and New PlantVillage datasets are displayed in Table 4. The New PlantVillage dataset’s improved annotations consistently increase every metric: Overall accuracy rises from 94.8 ± 0.8% to 96.3 ± 0.6%, precision advances from 94.2 ± 0.9% to 95.9 ± 0.7%, recall advances from 94.5 ± 0.8% to 96.1 ± 0.6%, and F1-score improves from 94.3 ± 0.8% to 96.0 ± 0.6%.
Table 4.
Detailed performance metrics (5-shot learning).
The mean Average Precision (mAP) metric, which measures performance at different confidence levels, indicates improvement from 95.1 ± 0.7% to 96.8 ± 0.5%. Further evidence that improved annotations lead to more efficient learning comes from the adaptation time, which decreases from gradient steps to steps. These results demonstrate the need for high-quality ground-truth labels when few-shot learning is used, and when the model must learn as much as it can from a limited number of samples.
With 98.2 ± 0.4% accuracy on the New PlantVillage dataset, the 10-shot learning results in Table 5 show near-optimal classification performance, almost reaching the theoretical upper bound limited by annotation ambiguity and inter-class similarity. Declining returns are seen in performance gains over five-shot learning (+1.9 percentage points), indicating that feature learning is effectively saturated at 5–10 examples. Reduced adaptation time to 33.5 ± 4.8 steps confirms effective convergence under more supervision. A drop in the standard deviation from ±0.6% to ±0.4% suggests improved prediction confidence and stability. With ten labeled examples per disease class easily accessible, these metrics validate the practicality of agricultural deployment.
Table 5.
Detailed performance metrics (10-shot learning).
Although the PlantVillage and New PlantVillage datasets provide broad coverage of crop disease categories, the distribution of samples across classes is inherently imbalanced, with several disease types containing substantially fewer images than healthy classes. Such class imbalance can influence macro-averaged metrics because each class contributes equally to the macro precision, recall, and F1-score regardless of its frequency. To provide a more comprehensive and distribution-aware evaluation, we additionally computed weighted-average metrics, where each class contribution is scaled by its proportional frequency in the dataset. The weighted metrics offer a more realistic estimate of model performance under skewed class distributions typically observed in agricultural settings. As shown in Table 6, the weighted precision, recall, and F1-scores remain consistently high and closely aligned with macro averages, indicating that AgriFewNet maintains stable performance even on minority classes. This confirms the robustness of the attention-enhanced meta-learning framework in handling rare disease categories and minimizing performance degradation due to dataset imbalance.
Table 6.
Weighted vs. macro-average performance metrics (5-shot learning).
3.5. Per-Class Performance Analysis
As demonstrated in Table 7, performance varies across disease classes, which has significant practical deployment implications. The top-performing classes include healthy specimens (Apple: 98.7 ± 0.4%, Grape: 98.3 ± 0.5%, and Strawberry: 97.9 ± 0.6%) and diseases with distinctive visual characteristics (Grape Black Rot: 97.5 ± 0.7%, and Apple Black Rot: 97.2 ± 0.8%). Even with fewer examples, the few-shot learning model successfully captures the distinct discriminative features of these classes.
Table 7.
Per-class performance analysis (top/bottom 5 classes).
However, difficult classes like Potato Early Blight (90.8 ± 1.7%), Tomato Target Spot (90.4 ± 1.8%), and Corn Gray Leaf Spot (89.2 ± 2.1%) show lower accuracy. These illnesses have morphological similarities to other conditions, show significant variability in appearance, and have mild symptoms in the early stages. For example, both Corn Common Rust and Corn Gray Leaf Spot appear as long lesions on leaves, which can be confusing even for knowledgeable agronomists.
In five-shot learning, Figure 9 displays the confusion matrix for the top 10 classes. While off-diagonal elements show systematic error patterns, the diagonal dominance (96–98% values) confirms strong overall performance. Due to Solanaceae family members sharing morphological characteristics, there are notable confusions between diseases that affect the same host plant (for example, Potato Late Blight is misclassified as Tomato Early Blight in 2% of cases). There is little cross-confusion between healthy classes, and the majority of mistakes are made when misclassifying plants as diseased instead of healthy, suggesting a conservative bias toward disease detection.
Figure 9.
Confusion matrix for top 10 classes in a 5-shot learning scenario.
With 98.0% accuracy in all ten classes, the confusion matrix in Figure 10 shows outstanding classification performance. Consistently displaying 98.0% accurate predictions, diagonal values demonstrate strong discriminative ability. Small misclassifications mostly happen between diseases that are visually similar: diseased leaves are sometimes mistaken for healthy ones (0.3–0.8%), and potato late blight is mistaken for tomato early blight (1.0%). Model learning appears to be balanced based on the symmetric error distribution. The effectiveness of the suggested few-shot learning methodology in agricultural disease classification with limited training samples is validated by the notable lack of cross-species disease confusion (<0.5%) and the high specificity (98.0%) maintained by healthy classes.
Figure 10.
Top ten classes’ confusion matrix in a 10-shot learning scenario.
A closer examination of the misclassified samples indicates that the performance gaps among the low-performing classes arise from a combination of visual ambiguity and dataset inconsistencies. As summarized in Table 8, several disease categories exhibit closely resembling morphological patterns, especially those involving elongated leaf lesions, circular necrotic spots, or gradual color transitions that closely match the symptoms of neighboring disease classes. These similarities often guide the model toward shared texture patterns instead of class-specific cues, increasing confusion among visually overlapping categories. Additionally, certain minority classes contain subtle or early-stage symptoms that provide very low contrast against the surrounding leaf tissue, making it difficult for the feature extractor to capture reliable discriminative features. Variability in annotation quality, including occasional labeling errors and variations in illumination, background, and leaf orientation, further increases the likelihood of misclassification. Overall, this analysis shows that the main challenges stem not only from modeling limitations but also from the inherent complexity of these visually similar disease patterns and inconsistencies within the dataset. These insights highlight the importance of incorporating more fine-grained feature extraction strategies and improving dataset curation to reduce ambiguity in future work.
Table 8.
Common error sources in low-performance disease classes.
3.6. Ablation Study and Component Analysis
A systematic ablation study that quantifies the contribution of each architectural component is given in Table 9. Attention mechanisms play an important role in discriminative region localization, as evidenced by the 3.6 percentage point accuracy drop (from 94.8% to 91.2%) that occurs when they are removed. Performance is reduced by 2.1 percentage points when the temporal consistency module is ablation, confirming its significance for applications involving sequential monitoring.
Table 9.
Component contribution analysis.
A noteworthy 5.4% improvement is achieved with data augmentation techniques, underscoring the significance of exposure to a range of visual conditions throughout meta-training. The model’s accuracy of 89.4% without augmentation shows limited generalization and overfitting to particular imaging conditions. The synergistic nature of the suggested architecture, where each component tackles unique challenges in agricultural few-shot learning, is demonstrated by the cumulative effect of all components.
Table 10 presents a sensitivity analysis of reduction ratio values (r∈ {4, 8, 16, 32}) used in the channel attention module. The results illustrate how varying r influences parameter count, accuracy, and model efficiency. While smaller r values improve feature richness, they increase complexity; larger values lead to underfitting. The selected r = 16 offers the best performance–efficiency balance.
Table 10.
Sensitivity analysis of the channel attention reduction ratio (r).
The ablation results in Table 11 demonstrate that = 1.0, = 0.5, = 0.3, and = provide the optimal balance between convergence stability and discriminative feature learning under few-shot conditions.
Table 11.
Effect of weight selection on 5-shot validation accuracy.
3.7. Computational Efficiency and Scalability
A thorough examination of resource usage is shown in Table 12. Using 8.7 GB of memory, 2.1 GFLOPs per inference, and 4.2 h of training time, the proposed method achieves comparable computational efficiency. While maintaining superior accuracy, our method reduces computational demands by 49% and 44%, respectively, in comparison with typical fine-tuning approaches that require 8.3 h of training time and 15.6 GB of memory.
Table 12.
Resource utilization comparison.
Field-deployable agricultural monitoring systems require deployment on edge devices with limited storage capacity, which is made possible by the model’s 11.2 MB size. Drone-based surveillance and automated greenhouse monitoring are among the real-time processing applications supported by the 8.7 ms inference time per image [31]. The gradient steps’ adaptation efficiency enables quick deployment to novel agricultural contexts in a matter of minutes as opposed to the hours or days needed by conventional transfer learning techniques.
3.8. Robustness Under Adverse Conditions
As shown in Figure 11, model performance is assessed in agricultural areas under genuine adverse situations. The model’s accuracy in five-shot learning is 94.8% under optimal imaging conditions. Under low light, performance deteriorates somewhat to 91.5%, highlighting the advantages of attention processes that improve contrast in areas with low illumination [32].
Figure 11.
Performance across various shot configurations in challenging circumstances.
In high-noise scenarios, which resemble atmospheric interference or sensor degradation, accuracy drops to 88.7%, or 6.1 percentage points. Motion blur, which frequently occurs in automated or drone-based systems, reduces performance to 85.4%. Occlusion by environmental factors (dust, rain, and overlapping leaves) is the most difficult condition, lowering accuracy to 82.3% while maintaining usable performance for real-world applications.
In higher-shot settings, the performance decline is less severe; even in occlusion, 10-shot learning maintains an accuracy of above 90%. A useful feature for reliable real-world deployment; this implies that extra support examples offer redundancy that makes up for unfavorable circumstances. The attention processes concentrate on less damaged areas and feature channels, which helps to somewhat alleviate the negative effects of unpleasant conditions.
3.9. Adaptation Speed and Learning Efficiency
The adaptation speed for various shot configurations and methodologies is measured in Figure 12. The proposed AgriFewNet approach needs 41 gradient steps to converge in five-shot learning, which is 38.8% faster than MAML’s 67-step adaptation time and as efficient as Prototypical Networks’ (35-step) while still achieving higher accuracy. The meta-learning approach is validated for quick adaptation scenarios due to its significant advantage over fine-tuning (132 steps).
Figure 12.
Convergence across various techniques and shot configurations requires gradient steps.
The adaptation step need rises to 45 steps in one-shot learning, indicating the increased difficulty of gleaning enough information from single examples. Ten-shot learning, on the other hand, decreases adaptation to 38 steps since more examples offer richer supervision. Effective information use by the meta-learned initialization is suggested by the inverse relationship between adaptation time and support set size.
The capacity for quick adaptation has significant ramifications for real-world implementation. The suggested method can be adapted in about 6 min ( inference time per step) using fewer than 50 examples, while traditional approaches require gathering thousands of examples and retraining for days in the case of an emerging disease outbreak. This responsiveness is essential for managing diseases and implementing timely agricultural interventions.
3.10. Statistical Significance Analysis
The results of pairwise statistical significance tests that were performed using independent t-tests with Bonferroni correction for multiple comparisons are recorded in Table 13. It is highly unlikely that observed differences are the result of chance (probability ), as all performance improvements of the proposed AgriFewNet method over baseline approaches achieve statistical significance at the level across shot configurations.
Table 13.
Pairwise significance tests (p-values).
The comparison with Prototypical Networks shows slightly less pronounced but still highly significant improvements ( for 10-shot; for 1-shot and 5-shot scenarios). The proposed method’s advantage is confirmed by these thorough statistical validations to be a true methodological development rather than the result of favorable dataset sampling or random variation.
One of the sources of statistical strength is also reflected in a few low standard deviations of several experimental repetitions (– for the proposed method as compared to – for baselines), which indicate stable results and reliable reproducibility—the two main features that are necessary not only for scientific confirmation but also for the trust of practical application.
3.11. Learning Dynamics and Convergence Behavior
The dynamics of meta-training as shown in Figure of learning curves bring forth significant understanding of the learning process. This early quick reduction in losses (episodes 0 2000) is associated with the model to learn the simple discriminatory attributes that are common to the agricultural tasks. The next gradual refinement stage (episodes 2000–8000) is meta-optimization of the very adaptation mechanism to be able to learn how to effectively use the limited examples to learn the task-specific knowledge.
The velocity of validating the model indicates a three-stage pattern (rapid initial improvement, 85–90 percent, episode 0–1500), linear growth (90–95 percent, episode 1500–6000), and plateau (95–96.3 percent, episode 6000–10,000). This plateau onset at episode 6000 indicates that the returns to additional training below this episode are diminishing and this would be used to make informed decisions about the duration of effective training. The lack of overfitting (deviation of training and validation curves) proves the effectiveness of regularization and the powerful generalization.
4. Discussion
Recent advances in plant disease detection have increasingly focused on few-shot learning, meta-learning, and attention-enhanced CNN architectures. Approaches such as Prototypical Networks, MAML, and relation-based classifiers have shown promise in low-data environments; however, they often struggle with complex intra-class variability, visually overlapping symptoms, and the need for rapid domain adaptation. Several transformer-based and graph-learning approaches have also been proposed, yet these frequently rely on larger datasets, intensive computational resources, or long training cycles, limiting their practicality in real-world agricultural settings.
Compared with these methods, AgriFewNet introduces several technical advantages. The dual-attention enhanced ResNet-18 encoder first of all, manages to effectively capture both spatial and channel-level discriminative cues. This empowers the model to visually separate subtle disease patterns that normally confuse classical CNN or shallow meta-learning methods. Secondly, the prototype-based adaptation module provides enhanced class separability under few-shot conditions by enabling the stabilization of decision boundaries, especially for minority and morphologically similar disease categories. Thirdly, the temporal consistency module thereby improves the model robustness across training episodes and hence solves the fluctuations problem of meta-trained models that has been observed. AgriFewNet, unlike transformer-based or heavier multimodal approaches, keeps a lightweight architecture that is computationally efficient and can be deployed on low-resource devices that are used by farmers and field technicians.
Moreover, the system is quite resistant to data challenges such as class imbalance, early-stage symptoms, and annotation variations, which can be inferred from the error analysis explained in Section 3.5 and summarized in Table 8. By combining weighted metrics and prototype stability, the network is able to work consistently even in scenarios that are visually ambiguous, which is the case when the performance of competing models drastically drops. Thus, this broadened discussion reveals that while several related technologies are in place to facilitate progress in few-shot plant disease detection, AgriFewNet is a perfect combination of efficiency, interpretability, and robustness and, hence, it is very suitable for real-world crop monitoring applications.
4.1. Limitations
Despite the fact that AgriFewNet shows impressive results in few-shot crop disease classification with RGB images, there are still a number of limitations. The model is mostly tested on controlled datasets, and its robustness in real-world field conditions with changes in illumination, complex backgrounds, occlusions, and sensor noise, is not yet confirmed. Problems of class imbalance and subtle disease symptoms challenge fine-grained discrimination, especially in situations where visually overlapping disease patterns. The meta-learning strategy makes the model more adaptable, but its performance might still decrease under extreme domain shifts or if it comes across completely new crop varieties. Moreover, the model, although computationally lightweight, may need further optimization if it is to be deployed on very low-power agricultural devices. While hyperspectral and thermal data acquisition remains expensive, emerging low-cost alternatives such as cross-modal synthetic data generation, RGB-to-hyperspectral translation, and physics-based simulation can provide surrogate multimodal information without real sensor deployment. These directions help mitigate high acquisition costs and open pathways for integrating multimodal cues in future plant disease systems. It is important to overcome these limitations to make sure that the framework can be scaled, is reliable, and usable by farmers in real operational agricultural environments.
4.2. Future Research Directions
Further research will aim at making AgriFewNet more adaptable, scalable, and applicable in the real world. One of the major directions is enhancing generalization by means of large-scale field data collection, semi-supervised learning, and domain adaptation techniques that can alleviate distribution shifts between laboratory datasets and real-world scenarios. In addition, there are a lot of alternatives to the multimodal sensing that researchers can explore without necessarily buying an expensive NIR or thermal sensor. Simulation-based synthetic data generation, cross-modal translation techniques, and virtual sensing strategies can be employed to obtain additional modalities directly from RGB images, thus, approximating the sensors. These methods have the potential to enrich the feature space without the need for additional hardware.
Future extensions of AgriFewNet will incorporate cross-modal synthetic data pipelines where hyperspectral or thermal-like signals are generated from RGB images using GANs, diffusion models, or physics-driven simulators. These low-cost alternatives remove the dependency on expensive sensors and allow the model to benefit from multimodal representations while maintaining affordability. The utilization of explainable AI elements to raise user confidence and interpretability, broadening the framework to farmlands tasks, such as yield prediction and nutrient deficiency recognition, and the use of light model compression methods to enable deployment on drones and edge devices are some of the many ways further advancements might be realized [33]. Moreover, long-term reliability and operational robustness will be equally important and can only be addressed through comprehensive multi-season trials across various agroclimatic zones.
5. Conclusions
One of the major issues that modern agriculture faces is the constant challenge of environmental uncertainty and data scarcity, which requires the use of smart systems that are capable of making quick adjustments. The proposed research presented AgriFewNet, a data-efficient few-shot learning framework that was specifically designed to solve the problem of plant disease diagnosis in the case of limited labeled data. The model through the combination of hierarchical attention-based feature extraction, prototype-driven classification, and MAML-based meta-learning was able to demonstrate its strong adaptability to different crop species and disease conditions. The experiments on the PlantVillage and New PlantVillage datasets have demonstrated that AgriFewNet has reached the accuracies of 87.3%, 94.8%, and 97.1% for 1-shot, 5-shot, and 10-shot settings, respectively, and thus, are able to outperform the existing few-shot learning baselines by up to 7.9%. The model also showed that it could quickly converge in 41 ± 7 gradient steps, which made it the most suitable real-time application in agriculture where fast adaptation is required. Furthermore, AgriFewNet not only leads to accuracy improvement but can also effectively learn discriminative rtepresentations from RGB images only, which will ensure the feasibility of its large-scale deployment using inexpensive devices such as smartphones and drones. The evaluation has also pointed to the advantages of cross-domain generalization, persistence over the variations in the environment, and reliability of repeated experiments. To sum up, AgriFewNet is a compact and scalable solution for smart crop monitoring in areas with limited resources.
Author Contributions
Conceptualization, R.R.N. and T.B.; Methodology, R.R.N. and B.B.; Software, T.B.; Validation, R.R.N. and T.B.; Formal Analysis, R.R.N.; Investigation, T.B.; Resources, W.H.K. and J.N.; Data Curation, T.B.; Writing—Original Draft Preparation, R.R.N. and T.B.; Writing—Review and Editing, B.B.; Visualization, T.B.; Supervision, R.R.N.; Project Administration, R.R.N.; Funding Acquisition, W.H.K. and J.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The Plant Diseases Dataset is available at https://www.kaggle.com/datasets/emmarex/plantdisease (12 February 2025) and the New Plant Diseases Dataset is available at https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset/data (20 April 2025). All model configurations, experimental codes, and additional materials can be obtained from the corresponding author upon reasonable request.
Acknowledgments
The authors acknowledge that language-editing assistance was used solely for improving grammar and readability. No external service contributed to the scientific content, analysis, or conclusions of this manuscript. The authors take full responsibility for all content presented.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| DL | Deep Learning |
| AgriFewNet | Agricultural Few-Shot Network |
| FSL | Few-Shot Learning |
| MAML | Model-Agnostic Meta-9 Learning |
| RGB | Red Green Blue |
| NIR | near-infrared |
| AI | Artificial Intelligence |
| ML | machine learning |
| CNNs | Convolutional Neural Networks |
| MLP | Multi-Layer Perceptron |
References
- Strange, R.N.; Scott, P.R. Plant disease: A threat to global food security. Annu. Rev. Phytopathol. 2005, 43, 83–116. [Google Scholar] [CrossRef]
- Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
- Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Nair, R.R.; Babu, T. Effective Hazard Categorization of Near-Earth Objects with Random Forest Techniques. Procedia Comput. Sci. 2025, 259, 1682–1692. [Google Scholar] [CrossRef]
- Hughes, D.P.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning. arXiv 2015, arXiv:1511.08060. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; Available online: https://proceedings.mlr.press/v139/tan21a.html (accessed on 1 April 2025).
- Zhou, X.; Wang, D. A Survey of Attention Mechanisms in Computer Vision. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-Shot Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4077–4087. Available online: https://papers.nips.cc/paper/2017/hash/cb8da6767461f2812ae4290eac7cbc42-Abstract.html (accessed on 1 April 2025).
- Sung, F.; Yang, Y.; Zhang, L.e.a. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. Available online: https://openaccess.thecvf.com/content_cvpr_2018/html/Sung_Learning_to_Compare_CVPR_2018_paper.html (accessed on 1 April 2025).
- Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Wang, Y.; Yao, Q. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 2020, 53, 63. [Google Scholar] [CrossRef]
- Babu, T.; Nair, R.R.; S, K. TPU-Accelerated Deep Learning for Accurate Satellite Land Classification. Procedia Comput. Sci. 2025, 258, 1179–1188. [Google Scholar] [CrossRef]
- Pradawet, C.; Khongdee, N.; Pansak, W.; Spreer, W.; Hilger, T.; Cadisch, G. Thermal imaging for assessment of maize water stress and yield prediction under drought conditions. J. Agron. Crop Sci. 2023, 209, 56–70. [Google Scholar] [CrossRef]
- Song, Y.; Wang, T.; Cai, P.; Mondal, S.K.; Sahoo, J.P. A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities. ACM Comput. Surv. 2023, 55, 271. [Google Scholar] [CrossRef]
- Wen, T.; Li, J.H.; Wang, Q.; Gao, Y.Y.; Hao, G.F.; Song, B.A. Thermal imaging: The digital eye facilitates high-throughput phenotyping traits of plant growth and stress responses. Sci. Total Environ. 2023, 899, 165626. [Google Scholar] [CrossRef] [PubMed]
- Antapurkar, S.; Karadkhele, J.; Kokane, S.R.; Kadam, A. Enhancing Agriculture: Implementing Web Application for Agriculture Solutions from Experts. 2024 8th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 23–24 August 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Tan, F.; Mo, X.; Ruan, S.; Yan, T.; Xing, P.; Gao, P.; Xu, W.; Ye, W.; Li, Y.; Gao, X.; et al. Combining Vis-NIR and NIR Spectral Imaging Techniques with Data Fusion for Rapid and Nondestructive Multi-Quality Detection of Cherry Tomatoes. Foods 2023, 12, 3621. [Google Scholar] [CrossRef] [PubMed]
- Nair, R.R.; Babu, T. Evaluating Feature-Based Image Registration Techniques for Aeroplane and Brain MRI Images using Projective Transformation. Procedia Comput. Sci. 2025, 259, 1662–1671. [Google Scholar] [CrossRef]
- Wang, X.; Xiao, Z.; Deng, Z. Swin Attention Augmented Residual Network: A fine-grained pest image recognition method. Front. Plant Sci. 2025, 16, 1619551. [Google Scholar] [CrossRef]
- Yang, M.; Chu, X.; Zhu, J.; Xi, Y.; Niu, S.; Wang, Z. Adaptive federated few-shot feature learning with prototype rectification. Eng. Appl. Artif. Intell. 2023, 126, 107125. [Google Scholar] [CrossRef]
- Vidya, H.A.; Murthy, M.S.N. TWFSL-MM: Few-Shot Learning using Meta-Learning and Metric-Learning for Disease Detection in Azadirachta Indica. Eng. Technol. Appl. Sci. Res. 2025, 15, 21129–21135. [Google Scholar] [CrossRef]
- Ye, F.; Lin, B.; Yue, Z.; Zhang, Y.; Tsang, I.W. Multi-objective meta-learning. Artif. Intell. 2024, 335, 104184. [Google Scholar] [CrossRef]
- Wu, X.; Deng, H.; Wang, Q.; Lei, L.; Gao, Y.; Hao, G. Meta-learning shows great potential in plant disease recognition under few available samples. Plant J. 2023, 114, 767–782. [Google Scholar] [CrossRef]
- Fu, M.; Wang, X.; Wang, J.; Yi, Z. Prototype Bayesian Meta-Learning for Few-Shot Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 7010–7024. [Google Scholar] [CrossRef]
- Wang, Q.; Xiao, Z.; Mao, Y.; Qu, Y.; Shen, J.; Lv, Y.; Ji, X. Model Predictive Task Sampling for Efficient and Robust Adaptation. arXiv 2025, arXiv:2501.11039. [Google Scholar]
- Zhao, W.; Duan, L.; Ma, B.; Meng, X.; Ren, L.; Ye, D.; Rui, S. Applications of Optimization Methods in Automotive and Agricultural Engineering: A Review. Mathematics 2025, 13, 3018. [Google Scholar] [CrossRef]
- Wang, J.J.; Jing, Y.Y.; Zhang, C.F. Weighting methodologies in multi-criteria evaluations of combined heat and power systems. Int. J. Energy Res. 2009, 33, 1023–1039. [Google Scholar] [CrossRef]
- Yang, S.; Feng, Q.; Zhang, J.; Yang, W.; Zhou, W.; Yan, W. From laboratory to field: Cross-domain few-shot learning for crop disease identification in the field. Front. Plant Sci. 2024, 15, 1434222. [Google Scholar] [CrossRef]
- Shahi, T.B.; Xu, C.Y.; Neupane, A.; Guo, W. Recent Advances in Crop Disease Detection Using UAV and Deep Learning Techniques. Remote Sens. 2023, 15, 2450. [Google Scholar] [CrossRef]
- Hassan, S.M.; Jasinski, M.; Leonowicz, Z.; Jasinska, E.; Maji, A.K. Plant Disease Identification Using Shallow Convolutional Neural Network. Agronomy 2021, 11, 2388. [Google Scholar] [CrossRef]
- Srivastav, A.K.; Das, P. Edge Computing and AI in Agricultural IoT. In Biotechnology and IoT in Agriculture and Food Production: Green Innovation; Apress: Berkeley, CA, USA, 2025; pp. 187–200. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).