Next Article in Journal
Design and Experimental Testing of a Self-Propelled Overhead Rail Air-Assisted Sprayer for Greenhouse
Previous Article in Journal
Ambrosia artemisiifolia in Hungary: A Review of Challenges, Impacts, and Precision Agriculture Approaches for Sustainable Site-Specific Weed Management Using UAV Technologies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3

by
Pathmanaban Pugazhendi
1,
Chetan M. Badgujar
2,*,
Madasamy Raja Ganapathy
3 and
Manikandan Arumugam
4
1
Department of Automobile Engineering, Easwari Engineering College, Chennai 600089, Tamil Nadu, India
2
Biosystems Engineering and Soil Science, The University of Tennessee, Knoxville, TN 37996, USA
3
Department of Information Technology, Paavai Engineering College, Namakkal 637018, Tamil Nadu, India
4
Department of Computer Science and Engineering, Paavai Engineering College, Namakkal 637018, Tamil Nadu, India
*
Author to whom correspondence should be addressed.
AgriEngineering 2026, 8(1), 31; https://doi.org/10.3390/agriengineering8010031
Submission received: 19 November 2025 / Revised: 16 December 2025 / Accepted: 24 December 2025 / Published: 16 January 2026

Abstract

Sugarcane diseases cause estimated global annual losses of over $5 billion. While deep learning shows promise for disease detection, current approaches lack transparency and confidence estimates, limiting their adoption by agricultural stakeholders. We developed an uncertainty-aware detection system integrating Monte Carlo (MC) dropout with MobileNetV3, trained on 2521 images across five categories: Healthy, Mosaic, Red Rot, Rust, and Yellow. The proposed framework achieved 97.23% accuracy with a lightweight architecture comprising 5.4 M parameters. It enabled a 2.3 s inference while generating well-calibrated uncertainty estimates that were 4.0 times higher for misclassifications. High-confidence predictions (>70%) achieved 98.2% accuracy. Gradient-weighted Class Activation Mapping provided interpretable disease localization, and the system was deployed on Hugging Face Spaces for global accessibility. The model demonstrated high recall for the Healthy and Red Rot classes. The model achieved comparatively higher recall for the Healthy and Red Rot classes. The inclusion of uncertainty quantification provides additional information that may support more informed decision-making in precision agriculture applications involving farmers and agronomists.

1. Introduction

Sugarcane (Saccharum spp. hybrids), an important cash crop, is cultivated across more than 100 countries, including tropical and subtropical regions, covering approximately 27 million hectares globally [1]. Sugarcane produces a variety of products and byproducts, not limited to sugar, bioethanol, molasses, and bagasse. Annual global output of primary product sugar reached 1.95 billion tons, generating $75 billion in economic value (i.e., 80% of global supply) in 2024 [2]. Brazil dominates global sugarcane production, with an annual output of 752.9 million tons, followed by India at 405.4 million tons. Together, these two nations account for over 60% of worldwide sugarcane production [3]. The multifaceted economic importance of sugarcane extends beyond sugar production, as pressed cane juice can be used to produce diesel, jet fuel, and other high-value products, while by-products serve in power generation, fertilizers, and agricultural substrates [4].
Despite its economic significance, the sugarcane industry faces substantial challenges due to crop diseases that threaten its productivity and sustainability. Plant-parasitic nematodes are a significant constraint in sugarcane production and can lead to a loss of up to 30% in productivity [5]. The economic impact is staggering, with the sugarcane sector having a market value of more than $56 billion worldwide. If the average productivity loss of sugarcane owing to parasitic nematodes is 10%, then the economic loss is estimated to be more than $5 billion annually [6]. Major plant diseases, including mosaic, rust, red rot, and yellow leaf syndrome, can reduce sugarcane yield by 10–50%, or even 60–80% in severe cases, causing significant losses across producing regions [7]. Traditional disease detection methods rely heavily on visual inspection by domain experts, a process that is labor-intensive, subjective, and often impractical for small-scale farmers in developing regions, where sugarcane cultivation supports millions of livelihoods [8].
The evolution of disease detection methodologies has witnessed a change in thinking with the emergence of computer vision and machine learning technologies [9,10,11]. Smartphone proliferation and deep learning advances enable field-deployable disease diagnosis systems [12]. Deep learning approaches have been remarkably successful in agricultural applications. These techniques, which are based on convolutional neural networks (CNNs), have been successful in achieving high accuracy in identifying various plant lesions from images [13]. The application of CNNs has revolutionized plant pathology, enabling automated feature extraction and classification that surpasses traditional hand-engineered approaches in terms of accuracy and efficiency [14]. Recent studies have demonstrated the effectiveness of deep learning across diverse crop species: depth-wise separable adaptive neural networks achieved robust detection of potato diseases [15], multimodal frameworks combining visual and contextual information improved pepper disease and pest identification [16], and particle swarm optimization with YOLOv8 enhanced tomato plant disease detection performance [17].
Despite technical advances, the adoption of agricultural artificial intelligence (AI) remains below 15% in developing regions [18]. Key identified barriers include, but are not limited to, (1) the absence of confidence measures in predictions, (2) computational requirements exceeding field devices, and (3) lack of interpretability for non-technical users.
The deployment of deep learning models in resource-constrained agricultural environments presents additional challenges for researchers [19]. Traditional and state-of-the-art models have demonstrated good accuracy, but their practicality as end-user solutions remains uncertain owing to current resource limitations [20]. Mobile deployment requires careful consideration of computational constraints, as large models often demand resources that are unavailable under field conditions [21]. The development of lightweight architectures, such as MobileNet, has addressed some of these concerns, with MobileNetV2 for the real-time detection of plant diseases using images obtained from smartphones [22]. Farmer adoption of AI technology faces multifaceted barriers beyond technical limitations. There are multiple barriers to scaling from an industry perspective, such as fragmentation, lack of standard data architecture, and cross-platform interoperability [23]. Economic constraints are paramount, with 47% of respondents citing cost as a top concern, while trust issues stem from concerns about data ownership, algorithmic transparency, and the perceived disconnect between technology developers and agricultural practitioners [24].
Although deep learning has transformed plant disease detection, most existing systems operate as black-box classifiers that provide categorical outputs without confidence. Although such models are accurate under laboratory conditions, they often fail to deliver trustworthy and interpretable decisions in real-world fields. The absence of uncertainty quantification prevents farmers from assessing the reliability of predictions, whereas the high computational demands of conventional architectures, such as ResNet or DenseNet, limit their deployment on mobile or edge devices commonly used in agriculture [25]. These constraints create a persistent gap between algorithmic accuracy and practical usability in precision farming applications.
This study addresses the following research questions:
RQ1: Can uncertainty quantification be effectively integrated into lightweight deep learning architectures for plant disease detection without compromising classification accuracy or computational efficiency?
RQ2: Does prediction uncertainty correlate with classification errors, enabling reliable identification of ambiguous cases that require expert verification?
RQ3: Can an uncertainty-aware disease detection system be deployed on accessible web platforms while maintaining practical inference times for real-world agricultural applications?
To address these research questions, the present study introduces a lightweight, uncertainty-aware sugarcane disease detection framework that integrates Monte Carlo (MC) dropout uncertainty quantification with a MobileNetV3-Large backbone. The main contributions of this study are as follows:
(1) We develop a novel MC-Dropout-MobileNetV3 architecture that provides calibrated confidence estimates without additional parameters or architectural modifications, achieving 97.23% accuracy while maintaining a lightweight footprint (5.4 M parameters) suitable for resource-constrained deployment.
(2) We demonstrate that prediction uncertainty effectively discriminates between reliable and unreliable classifications, with misclassified samples exhibiting 5.38-fold higher uncertainty than correct predictions, enabling risk-stratified decision-making in agricultural applications.
(3) We integrate Gradient-weighted Class Activation Mapping (Grad-CAM) visualization to provide interpretable attention maps highlighting disease-relevant regions, thereby enhancing transparency and user trust in model predictions.
(4) We deploy the complete system on Hugging Face Spaces as a publicly accessible web platform, achieving practical inference times (2.3 s for 10 MC passes) and demonstrating real-world deployment feasibility for global accessibility.
Overall, this study establishes a deployable, interpretable, and uncertainty-aware AI framework for risk-informed disease management in precision agriculture. Table 1 summarizes representative studies in deep learning-based plant disease detection, highlighting their methodological approaches, performance metrics, and key limitations. While existing methods have achieved high classification accuracy, most lack uncertainty quantification capabilities, limiting their trustworthiness in practical agricultural decision-making. Additionally, many state-of-the-art architectures require substantial computational resources, restricting the deployment of resource-constrained devices commonly available to farmers.

2. Materials and Methods

2.1. Data Description

This study utilized the open-source Sugarcane Leaf Disease Dataset [26]. The data were collected from agricultural fields across the state of Maharashtra, India, a region accounting for 35% of India’s sugarcane production. The dataset represents diverse agro-climatic conditions typical of subtropical growing regions, facilitating natural disease occurrenceImages were captured using consumer-grade smartphones (8–48 megapixels) during daylight hours (8:00 AM–5:00 PM) at distances of 10–50 cm from the leaf surfaces. This methodology simulated practical field-deployment conditions with realistic lighting and positioning variations. The dataset included a total of 1638 images evenly distributed among the five disease classes. The detailed dataset statistics are presented in Table 2, and sample images are shown in Figure 1. Each disease class represented pathologically confirmed infections, as verified by plant pathology experts. The image processing step included the removal of blurred images, exclusion of multi-symptom samples, and laboratory verification of ambiguous diagnoses.

2.2. Data Preprocessing and Augmentations

The preprocessing steps included standardizing the input image size while preserving disease-relevant features for an accurate classification. The images were resized to 224 × 224 pixels using bilinear interpolation to match the classification model (i.e., MobileNetV3) requirements while maintaining computational efficiency. Moreover, the images were normalized using standard ImageNet normalization (μ = [0.485, 0.456, 0.406], σ = [0.229, 0.224, 0.225]). Data augmentation strategies can increase the dataset diversity for the training subset and significantly improve the model performance [28]. The augmentation transformations were applied randomly, which included rotations (±10°), horizontal/vertical translations (±10% of dimensions), horizontal flipping (50% probability), zoom scaling (0.9–1.1), and brightness adjustments (±10%). Color manipulations were avoided to preserve diagnostic signatures, such as rust pustule orange hues or red rot discoloration. Augmentation was implemented using TensorFlow’s preprocessing layers for on-the-fly processing, applied only to the training data to provide diverse and challenging subsets for the model development.

2.3. Model Architecture Development

In this study, we propose a hybrid architecture (Figure 2) that combines the MobileNetV3-Large backbone with a custom uncertainty-aware classification head optimized for inference efficiency. The MC dropout MobileNetV3 architecture integrates uncertainty quantification directly into the inference pipeline without architectural modifications or computational overheads. MobileNetV3-Large was used as the feature extractor because of its optimal balance between accuracy and deployment efficiency [29]. The custom classifier head replaces the standard dropout with MC dropout layers that remain active during inference through explicit training mode activation. This design enables Bayesian approximation through multiple stochastic forward passes, transforming the deterministic classifier into an uncertainty-aware system. The classifier progressively reduced the dimensionality from 960 input features through intermediate representations (1280, 640) to five output classes. The hard-swish and ReLU activation functions were used, which provide nonlinearity while maintaining the gradient flow. The dual dropout configuration (rates specified in Table 2) creates multiple stochastic sources that are essential for a robust uncertainty estimation. MobileNetV3-Large was chosen because it provides state-of-the-art accuracy while maintaining low computational complexity and fast inference, which is suitable for smartphones and edge hardware [30,31]. Unlike heavier architectures (ResNet or DenseNet), it achieves an efficient feature representation with minimal latency, allowing real-time deployment for field-based sugarcane disease detection. Furthermore, its modular structure enables the seamless integration of an MC dropout-based uncertainty head, resulting in a hybrid model that combines efficiency, interpretability, and confidence awareness within a single unified framework.

2.4. Model Training Strategy

Model optimization was designed to ensure stable convergence and efficient training within limited computational resources (Table 3). Gradient accumulation was used to simulate larger effective batch sizes under memory constraints, thereby doubling the batch size without additional graphics processing unit (GPU) usage. The AdamW optimizer, which incorporates decoupled weight decay, was employed to achieve improved regularization and prevent overfitting compared with the conventional Adam algorithm. Label smoothing regularization was applied to soften the target distributions, minimize overconfidence in the model predictions, and contribute to well-calibrated uncertainty estimates of the model. A cosine annealing learning rate schedule was implemented to gradually reduce the learning rate following a cosine decay pattern, thereby enabling smooth convergence while avoiding premature stagnation in the local minima. Early stopping was configured with adaptive patience to monitor the validation accuracy, automatically saving the optimal model checkpoint based on MC validation performance rather than standard deterministic accuracy, thereby emphasizing stability and uncertainty-aware optimization throughout training.

2.5. Uncertainty Quantification Framework

The MC dropout was adopted for uncertainty quantification because it introduces no extra parameters yet yields calibrated confidence estimates. During inference, the model performs T stochastic forward passes with dropout activation, thereby generating a distribution of predictions for each input sample. The mc predict method orchestrates this process by aggregating softmax probability outputs across all stochastic passes.
Let f x ; θ t represent the model output for input x under the dropout mask θ t .
The predictive distribution is obtained as
p y x 1 T t 1 T T f x ; θ t
The predictive mean represents the expected class probability, while the predictive variance across the T samples measures model uncertainty:
V a r p y x = 1 T t 1 T T f x ; θ t p ¯ y x 2
where the parameters are defined as follows: x   represents the input image to be classified; y denotes the predicted class label (one of five disease categories: Healthy, Mosaic, Red Rot, Rust, or Yellow); T is the total number of MC forward passes (set to 10 in this study, balancing computational efficiency with uncertainty estimation reliability); θ t represents the stochastic dropout mask applied during the t th forward pass, where neurons are randomly deactivated according to the specified dropout rates (0.2 and 0.3 for the two dropout layers); f x ; θ t denotes the softmax probability output vector of the model for input x under dropout mask θ t ; p y x represents the averaged predictive probability distribution over all T stochastic passes; and p ¯ y x is the mean predictive probability used as the reference for variance calculation. Prediction confidence is computed as the normalized inverse of uncertainty, indicating how stable the model’s outputs are across stochastic passes. This MC dropout approach introduces no architectural modifications or additional parameters, making it computationally efficient and suitable for deployment on resource-limited devices such as mobile phones. The quantified uncertainty serves two essential purposes: (1) identifying ambiguous or unreliable predictions that require expert verification. (2) Calibrating model confidence for risk-aware and interpretable decision-making. The continuous uncertainty values were categorized into three confidence tiers: (1) low (<0.4), (2) medium (0.4–0.7), and (3) high (>0.7), providing actionable guidance for agricultural practitioners in field applications.

2.6. Grad-CAM Interpretability

Grad-CAM implementation provides visual explanations for model predictions, which is crucial for building practitioner trust. The system registers forward and backward hooks on the final convolutional layer of MobileNetV3 to capture the activations and gradients during inference. The global average pooling of gradients generates importance weights, which are combined with activations to produce class-discriminative localization maps. Heatmaps underwent ReLU activation to retain only positive influences, followed by normalization for consistent visualization. The overlay process (α = 0.4) preserved the original image details while highlighting disease-relevant regions, enabling the visual verification of model attention patterns.

2.7. Validation Framework

The robustness of the model was assessed using multiple validation strategies. Stratified k-fold cross-validation was employed to maintain the class distributions across folds and evaluate consistency beyond single train-test splits. Each fold preserved the original model weights, assessing the generalization across different data partitions. Ablation studies systematically remove components to validate architectural contributions. Comparing full MC inference against single forward passes quantified the uncertainty value. Random and majority-class baselines established performance floors, confirming that the learned representations exceeded trivial solutions. Data leakage verification employed multiple random seeds for train-test splitting to assess whether the performance remained consistent across different data partitions. Significant performance variations would indicate overfitting to specific splits rather than genuine pattern learning.

2.8. Performance Analysis

The evaluation framework distinguishes between standard and MC accuracy, capturing both deterministic and probabilistic performances. Uncertainty analysis examined the correlations between prediction confidence and correctness, validating that higher uncertainty coincided with increased error probability. The classification reports provided per-class precision, recall, and F1-scores, identifying disease-specific strengths and weaknesses. Statistical significance testing through McNemar’s test compared model variants, while bootstrap confidence intervals quantified the performance uncertainty. The correlation coefficient between uncertainty and prediction errors validated the reliability of the confidence estimates for downstream decision-making.

2.9. Implementation Details

The system was implemented in PyTorch (2.8), leveraging GPU (NVIDIA A100) acceleration when available, while maintaining CPU compatibility for the deployment scenarios. The modular design separates the model definition, training logic, and evaluation components, facilitating maintenance and extension. Progress tracking through tqdm provides real-time training feedback, and comprehensive logging captures the metrics for post hoc analysis. Memory optimization techniques, including gradient accumulation, cache clearing, and pin memory usage, enabled training in resource-constrained environments. The final model checkpoint preserves not only the weights but also the optimizer state, training configuration, and class mappings, ensuring complete reproducibility.

3. Results

3.1. Training Dynamics and Model Convergence

The MC- Dropout- MobileNetV3 model converged at epoch 21, with a validation accuracy of 95.45%, demonstrating effective transfer learning from the pretrained backbone (Table 4 and Figure 3). The substantial improvement in training accuracy (71.53% to 98.76%), coupled with modest validation gains (86.53% to 95.25%), suggests that the ImageNet weights provided strong initial feature representations, requiring minimal adaptation to sugarcane disease patterns. The consistent 0.4–0.6% superiority of MC validation over standard validation confirms that uncertainty-aware inference improves prediction quality even during training. Training was conducted on Google Colaboratory using a CUDA-enabled GPU with PyTorch 2.8+. The computational environment provided sufficient resources for efficient training, with an average processing time of approximately 40–42 s per epoch, resulting in a total training duration of 17.5 min for 25 complete epochs.

3.2. Overall Classification Performance

The model achieved 97.23% MC accuracy on the held-out test set (n = 505), with uncertainty-aware inference providing a 0.4 percentage point improvement over standard deterministic predictions (96.83%). This positions the system competitively among sugarcane disease detection models in the literature, while uniquely providing calibrated uncertainty estimates (Table 5).
Rigorous validation through 5-fold cross-validation yielded a 99.13% mean accuracy (95% CI: 98.80–99.45%) with remarkably low variance (CV: 0.002), although the 2% gap between cross-validation (99.13%) and test set (97.23%) warrants investigation. The narrow confidence interval and minimal coefficient of variation suggest that the model performance is not dependent on specific training-validation splits. Split robustness testing across five random seeds further confirmed this stability (99.25%, SD = 0.33%), suggesting consistent performance or random initialization effects.
The confusion matrix (Figure 4) reveals a strong diagonal dominance with minimal off-diagonal misclassifications. The 1.9% gap between the cross-validation and test performance suggests a minor distribution shift between the validation folds and the final test set, although both metrics remain well above the practical deployment thresholds. Bootstrap validation (n = 1000) confirmed that these results were not statistical artifacts, with tight confidence bounds validating the reported performance levels.

3.3. Disease-Specific Performance Analysis

The model achieved high recall for the Healthy (1.00) class and high precision for the Mosaic (1.00) class based on 89 and 87 test samples, respectively. Red Rot showed balanced performance with both precision and recall at 0.97 on 116 test samples, the largest class in the test set, indicating strong reliability for these categories, minimizing false positives that could lead to unnecessary treatments and supporting the detection of severe fungal infections (Table 6). This performance is particularly significant, given the economic implications of misdiagnosis in commercial sugarcane cultivation.
A key finding was the correlation between prediction confidence and clinical impact. High-confidence predictions (mean: 0.82) for Healthy and Red Rot samples aligned with their high recall, whereas medium confidence levels for yellow disease (0.49–0.65) appropriately reflected greater diagnostic uncertainty. This calibrated confidence enables risk-aware decision-making, allowing farmers to seek expert validation in uncertain cases.
The radar chart visualization (Figure 5) confirmed a uniformly high performance across all disease classes, with no single disease dominating or underperforming significantly. All metrics exceeded 0.94, indicating that the model learned robust discriminative features for each disease category rather than overfitting to specific visual patterns. The balanced performance profile suggests that the model is ready for comprehensive field deployment across all five disease categories, rather than requiring disease-specific models.

3.4. Uncertainty Analysis

The MC dropout mechanism differentiated between reliable and unreliable predictions, with incorrect classifications exhibiting higher average uncertainty than correct predictions. This trend supports the hypothesis that uncertainty estimates can help flag potential errors (Table 7). The confidence gap of 0.332 between correct (0.82) and incorrect (0.49) predictions demonstrated well-calibrated uncertainty estimates. The statistically significant correlation (r = 0.365, p < 0.001) between uncertainty and prediction errors, while representing a medium effect size, provides a sufficient signal for practical risk stratification. The four-panel uncertainty analysis (Figure 6) revealed that high-confidence predictions (>0.7) achieved 98.2% accuracy, whereas low-confidence predictions (<0.4) dropped to 76.5%, confirming that confidence thresholds can effectively guide trust in model outputs. The calibration curve showed slight under-confidence at high probability ranges, which is a desirable characteristic for safety-agricultural applications, where conservative predictions prevent expensive misdiagnoses. The distinct uncertainty distributions for correct and incorrect predictions (Figure 7) exhibited minimal overlap, enabling practical threshold selection for deployment scenarios. This separation allows the system to automatically flag approximately 15% of the predictions for expert review while maintaining >99% accuracy on the remaining high-confidence cases, enabling selective expert review.
To enhance interpretability for agricultural practitioners, the model’s continuous uncertainty estimates were discretized into three actionable confidence tiers (Table 8). This mapping enables non-expert users to translate probabilistic outputs into clear decision guidance for field-level interventions.
For instance, when a farmer uploads an image of a leaf exhibiting clear rust symptoms, the system may return: “Rust disease detected (high confidence: 85%). Recommended action: apply the appropriate fungicide.” In contrast, for visually ambiguous samples with overlapping symptom patterns, the system provides a conservative response, such as: “Possible yellow disease detected (low confidence: 35%). The prediction is uncertain; please acquire a clearer image or consult an agricultural extension officer before proceeding.” This confidence-based interpretation framework bridges the gap between probabilistic model outputs and practical decision-making, thereby improving trust, usability, and risk-aware adoption in real-world agricultural settings.

3.5. Statistical Validation and Significance Testing

Statistical validation confirmed the model’s overwhelming superiority over baseline approaches, with McNemar’s test yielding χ2 = 388.0 (p < 0.001) against random classification and χ2 = 375.0 (p < 0.001) against majority class prediction (Table 9).
The model demonstrated large effect sizes relative to the baseline classifiers, supporting the hypothesis that the learned representations captured genuine disease patterns. The relatively short training time resulted from efficient transfer learning with ImageNet pre-trained weights, only fine-tuning rather than training from scratch. Receiver Operating Characteristic (ROC) analysis demonstrated a high discrimination capability across all disease classes, with AUC values exceeding 0.99 for each category (Figure 8). These statistical validations collectively demonstrate that the model’s performance is not only practically significant but also statistically robust, with effect sizes and significance levels far exceeding publication thresholds in all comparative analyses.

3.6. Model Comparisons and Ablation Study

The ablation study revealed that MC dropout contributed marginally to raw accuracy (97.23% vs. 96.83% without MC sampling), representing a 0.40 percentage point improvement. However, this modest accuracy gain understates MC dropout’s primary value: transforming a black-box classifier into an uncertainty-aware diagnostic system capable of identifying unreliable predictions (Table 10). The performance comparison visualization (Figure 9) illustrates the model’s advantage, with confidence intervals for the proposed approach entirely separated from the baseline methods. The minimal overlap between standard and MC-enhanced predictions suggests that uncertainty quantification can be added to existing agricultural AI systems with negligible computational overhead while providing substantial value through risk-aware predictions. Notably, the ablation results validate the architectural choices; removing any component (dropout layers, custom head, or transfer learning) resulted in a performance degradation exceeding 5%, confirming that each element contributes meaningfully to the effectiveness of the final system.

3.7. Model Interpretability

Grad-CAM visualization confirmed that the model focused on disease-relevant leaf regions rather than background artifacts or spurious correlations (Figure 10). Attention heatmaps consistently highlighted symptomatic areas, including rust pustules, mosaic patterns, red rot discoloration, and yellow patches, while largely ignoring healthy tissue and image backgrounds. This targeted attention validates that the CNN learned diagnostically meaningful features rather than dataset-specific biases.
Disease-specific attention patterns revealed distinct focus strategies for each condition in the study. Red Rot detection focused on stem-leaf junction regions, where fungal invasion typically initiates, whereas mosaic classification focused on the characteristic light-dark striping patterns across leaf blades. Rust detection focused on small, dispersed regions corresponding to individual pustules, demonstrating the model’s ability to identify fine-grained symptoms. The yellow disease attention maps showed more diffuse patterns, reflecting the systemic nature of nutritional disorders, which aligned with the slightly lower confidence scores for this class.
A notable finding emerged from the correlation between the attention intensity and prediction uncertainty (Figure 11). High-uncertainty predictions exhibited scattered, inconsistent attention patterns (mean attention coherence: 0.42), whereas confident predictions showed focused, concentrated activation on specific symptomatic regions (coherence: 0.81). This relationship suggests that the model’s uncertainty appropriately reflects the ambiguity in visual features rather than random variability. The single-sample analysis demonstrated that for misclassified cases, attention often focused on disease-relevant regions but with lower intensity or split between multiple disease characteristics, providing interpretable explanations for the model errors. This interpretability enables agricultural experts to understand and potentially correct model predictions, thereby building trust essential for practical adoption.

3.8. Web Platform Deployment

The model was deployed via Hugging Face Spaces, achieving a 2.3 ± 0.5 s response time for complete inference (10 MC passes). The interface provides predictions with confidence scores and probability distributions. The system maintained a 99.2% uptime during the three-month testing period (Figure 12). The 2.3 s inference time for 10 MC passes represents a practical trade-off for the intended agricultural use. The primary deployment scenario involves individual leaf diagnosis for treatment decisions rather than high-throughput screening. A farmer examining suspicious plants typically spends considerably more time photographing leaves and interpreting results than the inference duration. For comparison, Deep Ensembles, the primary alternative uncertainty method, would require approximately 11.5 s (5× inference time) for a comparable uncertainty quality. For users requiring faster throughput, a single forward-pass inference (without MC dropout) achieves 0.23 s response with 96.83% accuracy, although without any uncertainty estimates. Agricultural decision-making operates on hourly and daily timescales, whether to apply treatment today versus tomorrow, rather than requiring millisecond-level responses. Regarding deployment accessibility, the current Hugging Face Spaces implementation requires internet connectivity, which is increasingly available in Indian sugarcane-growing regions with >85% mobile penetration in rural Maharashtra, India. The lightweight architecture (5.4 M parameters, ~22 MB model file) is specifically designed to enable future offline deployment on smartphones, representing a planned development direction. The web interface was designed with non-technical users in mind, featuring a single-action workflow (upload image → receive result), color-coded confidence indicators (green for high, yellow for medium, red for low confidence), plain-language recommendations avoiding technical jargon, and no Registration requirements. Successful large-scale deployment would benefit from integration with existing agricultural extension services, including brief training sessions (15–30 min) demonstrating image capture best practices and the interpretation of the results.

4. Discussion

4.1. Performance Analysis and Benchmarking

The proposed MC-Dropout-MobileNetV3 system achieved 97.23% accuracy on the held-out test set, with uncertainty-aware inference providing a marginal 0.4% improvement over the standard deterministic predictions. This performance is competitive with recent sugarcane disease detection studies, although direct comparison remains challenging owing to dataset variability. Howard et al. [26] reported 86.53% accuracy using DenseNet on a similar dataset, while [31] achieved 98.45% using enhanced DenseNet architectures. However, neither system provided uncertainty quantification, which is the primary contribution of this study. The 1.9% gap between the cross-validation performance (99.13%) and test set accuracy (97.23%) suggests mild overfitting despite the regularization strategies used. This discrepancy likely stems from the limited dataset size (2521 images) and the single-region collection. The high cross-validation accuracy with a low variance (SD: 0.26%) indicates that the model learns consistent patterns within the dataset, but generalization to truly unseen data remains untested.

4.2. Uncertainty Quantification Insights

The correlation between prediction uncertainty and error probability (r = 0.365, p < 0.001) represented a medium effect size, sufficient for practical risk stratification but not strong enough for complete reliance on uncertainty estimates alone. The 5.38-fold higher uncertainty for misclassifications shows the discrimination capability, although the overlapping distributions suggest that uncertainty thresholds require careful calibration for deployment scenarios. The observation that high-confidence predictions (>0.7) achieved 98.2% accuracy, whereas low-confidence predictions (<0.4) dropped to 76.5%, validates the utility of confidence-based triage. In practical terms, this enables automated processing for approximately 85% of cases, while flagging 15% for manual review. However, the absence of proper calibration metrics (Expected Calibration Error, Maximum Calibration Error) limits our understanding of whether confidence scores accurately represent true probabilities.
This scenario illustrates the practical value of uncertainty quantification, which extends beyond raw accuracy improvements. Scenario: Pesticide Application Decision. Consider a farmer observing leaf discoloration that could indicate either yellow disease (nutrient deficiency, treatment: fertilizer adjustment, cost: ~$20/hectare) or early-stage Red Rot (fungal infection, treatment: fungicide application, cost: ~$80/hectare + potential crop loss if untreated). Without uncertainty quantification, a deterministic model predicts “Yellow Disease” with no confidence. Farmers apply fertilizers. If the prediction was incorrect and the actual condition was Red Rot, the fungal infection spread, potentially causing a 30–50% yield loss. With our uncertainty-aware system, the model predicts “Yellow Disease” but with LOW confidence (38%), indicating that the prediction is unreliable. Farmers are advised to seek expert verification. Upon closer examination by an agronomist, the condition was correctly identified as early Red Rot, enabling timely fungicide application and preventing significant crop loss. This scenario shows that the 0.4% accuracy improvement from the MC dropout understates its practical value; the critical contribution is knowing when not to trust the prediction, enabling risk-aware decision-making that prevents costly misdiagnoses.

4.3. Architectural Considerations

MobileNetV3-Large provided an effective backbone with only 5.4 M parameters, achieving performance comparable to larger models while maintaining deployment feasibility. The 2.3 s inference time for 10 MC passes represents a reasonable trade-off between uncertainty quality and response time. However, this study did not explore alternative uncertainty quantification methods, such as deep ensembles, temperature scaling, or evidential deep learning, which might provide better-calibrated uncertainties with different computational trade-offs. The dual dropout configuration (0.2, 0.3) was selected through a limited grid search; however, more sophisticated approaches, such as learned dropout rates or spatially adaptive dropout, might improve uncertainty estimates. Additionally, the fixed number of MC samples (T = 10) represents another hyperparameter that can be optimized based on the uncertainty convergence rate. Although this study utilized MC dropout for uncertainty quantification, alternative methodologies exist, each with distinct trade-offs. Deep Ensembles [32], which involve training multiple independent networks (typically five) and aggregating their predictions, offer superior calibration but necessitate a fivefold increase in model parameters and inference time. This requirement renders them impractical for mobile deployment, as our 5.4 M parameter model would expand to 27 M. Bayesian Neural Networks [33] provide principled uncertainty quantification through weight distributions but require specialized variational inference training, double the parameters (to store both the mean and variance per weight), and often exhibit optimization instability. Evidential Deep Learning [34] facilitates single-pass uncertainty estimation via Dirichlet parameterization but requires specialized loss functions and meticulous hyperparameter tuning, with limited validation in agricultural contexts. MC dropout was chosen because of its zero additional parameter introduction, lack of architectural modifications, and seamless integration into existing training pipelines, while offering well-calibrated uncertainty estimates suitable for resource-constrained deployment [35]. A systematic empirical comparison of these methods using agricultural data sets constitutes a promising direction for future research.

4.4. Interpretability Analysis

Grad-CAM visualizations confirmed that the model focused on disease-relevant regions with distinct attention patterns for each condition. The correlation between attention coherence and prediction confidence provides an interpretable signal for the sources of uncertainty. However, the qualitative nature of this analysis limits its rigor. Future work should employ quantitative metrics for attention consistency and explore whether attention-based uncertainty measures could complement MC dropout. The observation that misclassified cases often showed split attention between multiple disease characteristics suggests that the model struggled with ambiguous visual presentations. This finding highlights the importance of uncertainty quantification in cases where visual symptoms overlap between diseases.

4.5. Study Limitations

We also identified a few limitations of this study, including (1) Geographic constraints: The dataset is exclusively representative of Maharashtra, India, which may introduce regional bias and affect generalizability. The sugarcane varieties cultivated in Maharashtra, such as Co 86032 and CoM 0265, differ from those in Brazil (e.g., RB varieties) or Australia, and the presentation of disease symptoms can vary with local agroclimatic conditions, including humidity, temperature, and pathogen strain. However, the uncertainty quantification mechanism partially mitigated this limitation. When the model encounters unfamiliar regional characteristics, an increase in prediction uncertainty is anticipated, thereby automatically flagging potentially unreliable classifications for expert review, rather than producing overconfident errors. Nonetheless, pilot validation with local data is recommended prior to deployment in new agroclimatic zones in the future. (2) Dataset size and diversity: The data used in this study only included 2521 images across five categories, which may not capture the full spectrum of the disease. (3) The study lacks field validation because the model performance can vary under different field conditions, including variable lighting, camera angles, or image quality. Additionally, the model has not been validated under challenging field conditions, including strong direct sunlight, shadows from adjacent plants, moisture on leaf surfaces, or partially occluded leaves. Although the uncertainty quantification mechanism is expected to flag such challenging inputs with increased prediction uncertainty, systematic evaluation under these specific conditions remains essential for future work.
Despite these limitations, this study shows that uncertainty quantification can be integrated into agricultural AI systems without a significant computational burden. Web deployment shows technical feasibility, although adoption barriers extend beyond technology. Economic constraints, digital literacy, and trust in AI recommendations must be considered for successful deployment. The potential value of the system lies in preventing both under-treatment (missing diseases) and over-treatment (unnecessary pesticide application).

4.6. Future Directions

Several research directions could address the current limitations, including (1) dataset expansion with multi-region data collection across diverse sugarcane varieties and environmental conditions to improve generalizability. (2) A comparative evaluation of uncertainty methods (ensembles, evidential networks, conformal prediction) could identify optimal approaches for agricultural applications. (3) Participatory research with farmers examining trust, usability, and the decision-making impact of uncertainty estimates is essential. Longitudinal studies tracking disease management outcomes will provide evidence of practical value. (4) The integration potential of combining visual diagnosis with environmental data (weather, soil conditions), and historical disease patterns could improve prediction confidence and provide context-aware recommendations

5. Conclusions

This study integrated MC dropout uncertainty quantification into a MobileNetV3-based sugarcane disease detection system, achieving a test accuracy of 97.23% across five disease categories. The primary contribution lies in demonstrating that prediction confidence can be quantified in agricultural AI applications without sacrificing accuracy or requiring additional model parameters. The system successfully discriminated between reliable and unreliable predictions, with incorrect classifications showing a 5.38-fold higher uncertainty than correct predictions. The correlation between uncertainty and error probability (r = 0.365, p < 0.001) enabled risk-stratified decision-making, with high-confidence predictions achieving 98.2% accuracy. This allows automated processing for most cases while identifying approximately 15% that require expert review. The lightweight architecture (5.4 M parameters) and web deployment demonstrate the technical feasibility of resource-constrained agricultural settings. However, significant limitations constrain their current applicability. The geographically restricted dataset from Maharashtra, India, prevents claims of broader generalizability of the results. The absence of field validation with agricultural practitioners limits the understanding of practical utility and adoption potential. These constraints must be addressed before large-scale deployment. Future research should prioritize multi-region data collection, comparative evaluations of uncertainty quantification methods, and participatory field trials with farmers. The integration of environmental context and temporal disease progression could enhance the early detection capabilities. While this study establishes technical feasibility, translating uncertainty-aware AI into trusted agricultural practices requires addressing socio-technical factors beyond algorithmic performance. The framework presented in this study provides a foundation for uncertainty-aware plant disease detection systems. As agricultural AI continues to evolve, incorporating prediction confidence alongside accuracy will be essential for building practitioner trust and enabling risk-aware disease management decisions.

Author Contributions

Conceptualization, P.P. and C.M.B.; methodology, P.P.; software, P.P.; validation, P.P., M.R.G. and M.A.; formal analysis, C.M.B.; investigation, P.P.; resources, P.P.; data curation, C.M.B.; writing—original draft preparation, P.P. and C.M.B.; writing—review and editing, C.M.B. and M.R.G.; visualization, M.A.; supervision, M.A.; project administration, C.M.B.; funding acquisition, C.M.B. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for open access to this research was provided by University of Tennessee’s Open Publishing Support Fund.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset generated during this study is publicly available at https://www.kaggle.com/datasets/nirmalsankalana/sugarcane-leaf-disease-dataset (accessed on 20 September 2025). Code availability: https://github.com/pathmanaban86/uncertainty_sugarcane_classifier (accessed on 20 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Budeguer, F.; Enrique, R.; Perera, M.F.; Racedo, J.; Castagnaro, A.P.; Noguera, A.S.; Welin, B. Genetic Transformation of Sugarcane, Current Status and Future Prospects. Front. Plant Sci. 2021, 12, 768609. [Google Scholar] [CrossRef] [PubMed]
  2. Dukhnytskyi, B. World Agricultural Production. Ekon. APK. 2024, 26, 59–65. [Google Scholar] [CrossRef]
  3. Akansha Arora. Top-10 Sugarcane Producing Countries in the World 2024. Available online: https://currentaffairs.adda247.com/top-10-sugarcane-producing-countries-in-the-world/ (accessed on 17 October 2025).
  4. Lu, G.; Wang, Z.; Xu, F.; Pan, Y.-B.; Grisham, M.P.; Xu, L. Sugarcane Mosaic Disease: Characteristics, Identification and Control. Microorganisms 2021, 9, 1984. [Google Scholar] [CrossRef]
  5. Huang, W.; Wang, S.; Ge, C.; Wei, L.; Du, D.; Niu, Z.; Li, M.; Zheng, Z. Structural Optimization and Performance Evaluation of a Sugarcane Leaf Mulching Machine. Smart Agric. Technol. 2025, 12, 101116. [Google Scholar] [CrossRef]
  6. Bhuiyan, S.A.; Sherring, K.; Eglinton, J. Parasitic Nematodes of Sugarcane: A Major Productivity Impediment and Grand Challenges in Management. Plant Dis. 2024, 108, 2945–2957. [Google Scholar] [CrossRef]
  7. Viswanathan, R. Impact of Yellow Leaf Disease in Sugarcane and Its Successful Disease Management to Sustain Crop Production. Indian Phytopathol. 2021, 74, 573–586. [Google Scholar] [CrossRef]
  8. Sharma, R.; Rallapalli, S.; Magner, J. Optimizing Water-Efficient Agriculture: Evaluating the Sustainability of Soil Management and Irrigation Synergies Using Fuzzy Extent Analysis. Sci. Rep. 2025, 15, 29382. [Google Scholar] [CrossRef]
  9. Sharma, P.; Sharma, A. A Novel Plant Disease Diagnosis Framework by Integrating Semi-Supervised and Ensemble Learning. J. Plant Dis. Prot. 2024, 131, 177–198. [Google Scholar] [CrossRef]
  10. Pradhan, P.; Kumar, B.; Mohan, S. Comparison of Various Deep Convolutional Neural Network Models to Discriminate Apple Leaf Diseases Using Transfer Learning. J. Plant Dis. Prot. 2022, 129, 1461–1473. [Google Scholar] [CrossRef]
  11. Kunduracioglu, I.; Pacal, I. Advancements in Deep Learning for Accurate Classification of Grape Leaves and Diagnosis of Grape Diseases. J. Plant Dis. Prot. 2024, 131, 1061–1080. [Google Scholar] [CrossRef]
  12. Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 215232. [Google Scholar] [CrossRef]
  13. Shoaib, M.; Shah, B.; EI-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. An Advanced Deep Learning Models-Based Plant Disease Detection: A Review of Recent Research. Front. Plant Sci. 2023, 14, 1158933. [Google Scholar] [CrossRef]
  14. Abdullahi, H.S. Fast and Accurate Image Feature Detection for On-the-Go Field Monitoring Through Precision Agriculture: Computer Predictive Modelling for Farm Image Detection and Classification with Convolution Neural Network (CNN). Ph.D. Thesis, University of Bradford, Bradford, UK, 2020. [Google Scholar]
  15. Kaushik, I.; Prakash, N.; Jain, A. Plant Disease Detection Using a Depth-Wise Separable-Based Adaptive Deep Neural Network. Multimed. Tools Appl. 2025, 84, 887–915. [Google Scholar] [CrossRef]
  16. Liu, J.; Wang, X. A Multimodal Framework for Pepper Diseases and Pests Detection. Sci. Rep. 2024, 14, 28973. [Google Scholar] [CrossRef]
  17. Ayyad, S.M.; Sallam, N.M.; Gamel, S.A.; Ali, Z.H. Particle Swarm Optimization with YOLOv8 for Improved Detection Performance of Tomato Plants. J. Big Data. 2025, 12, 152. [Google Scholar] [CrossRef]
  18. Hernández, S.; López, J.L. Uncertainty Quantification for Plant Disease Detection Using Bayesian Deep Learning. Appl. Soft Comput. 2020, 96, 106597. [Google Scholar] [CrossRef]
  19. Li, W.Z.; Ma, S.C.; Wang, G.Y.; Huo, P.; Zhou, B.C.; Ma, J.Z.; Xie, Y.S.; Guo, C.; Wang, E.Z.; Yang, S. Detection of Sugarcane Stalk Node Based on Improved YOLOv8 and Its Deployment on Edge Device. Smart Agric. Technol. 2025, 12, 101385. [Google Scholar] [CrossRef]
  20. Khan, A.T.; Jensen, S.M.; Khan, A.R.; Li, S. Plant Disease Detection Model for Edge Computing Devices. Front. Plant Sci. 2023, 14, 1308528. [Google Scholar] [CrossRef] [PubMed]
  21. Alemán-Montes, B.; Serra, P.; Zabala, A.; Masó, J.; Pons, X. A near Real-Time Spatial Decision Support System for Improving Sugarcane Monitoring through a Satellite Mapping Web Browser. Smart Agric. Technol. 2025, 12, 101084. [Google Scholar] [CrossRef]
  22. Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey. Artif. Intell. Rev. 2025, 58, 93. [Google Scholar] [CrossRef]
  23. Limpamont, A.; Kittipanya-ngam, P.; Chindasombatcharoen, N.; Cavite, H.J.M. Towards Agri-food Industry Sustainability: Addressing Agricultural Technology Adoption Challenges through Innovation. Bus. Strategy Environ. 2024, 33, 7352–7367. [Google Scholar] [CrossRef]
  24. Sánchez, E.; Calderón, R.; Herrera, F. Artificial Intelligence Adoption in SMEs: Survey Based on TOE–DOI Framework, Primary Methodology and Challenges. Appl. Sci. 2025, 15, 6465. [Google Scholar] [CrossRef]
  25. Chahbouni, A.; El Manaa, K.; Abouch, Y.; El Manaa, I.; Bossoufi, B.; El Ghzaoui, M.; El Alami, R. Attention-Guided Differentiable Channel Pruning for Efficient Deep Networks. Mach. Learn. Knowl. Extr. 2025, 7, 110. [Google Scholar] [CrossRef]
  26. Daphal, S.D.; Koli, S.M. Enhanced Deep Learning Technique for Sugarcane Leaf Disease Classification and Mobile Application Integration. Heliyon 2024, 10, e29438. [Google Scholar] [CrossRef] [PubMed]
  27. Devi, B.S.; Chatrapati, K.S.; Sandhya, N. Enhanced Sugarcane Disease Detection Using DenseNet201 and DenseNet264 with Transfer Learning and Fine-Tuning. Front. Health Inform. 2024, 13, 687–713. [Google Scholar]
  28. Nitin; Gupta, S.B.; Yadav, R.; Bovand, F.; Tyagi, P.K. Developing Precision Agriculture Using Data Augmentation Framework for Automatic Identification of Castor Insect Pests. Front. Plant Sci. 2023, 14, 1101943. [Google Scholar] [CrossRef] [PubMed]
  29. Whata, A.; Dibeco, K.; Madzima, K.; Obagbuwa, I. Uncertainty Quantification in Multi-Class Image Classification Using Chest X-Ray Images of COVID-19 and Pneumonia. Front. Artif. Intell. 2024, 7, 1410841. [Google Scholar] [CrossRef]
  30. Howard, A.; Sandler, M.; Chen, B.; Wang, W.J.; Chen, L.-C.; Tan, M.X.; Chu, G.; Vasudevan, V.; Zhu, Y.K.; Pang, R.M.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
  31. Hussain, A.; Barua, B.; Osman, A.; Abozariba, R.; Asyhari, A.T. Performance of MobileNetV3 Transfer Learning on Handheld Device-Based Real-Time Tree Species Identification. In Proceedings of the 26th International Conference on Automation and Computing (ICAC 2021), Portsmouth, UK, 15 November 2021; pp. 1–6. [Google Scholar]
  32. Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6405–6416. [Google Scholar]
  33. Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
  34. Sensoy, M.; Kaplan, L.; Kandemir, M. Evidential Deep Learning to Quantify Classification Uncertainty. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 3183–3193. [Google Scholar]
  35. Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 1050–1059. [Google Scholar]
Figure 1. Sample images of leaf disease: (a) Healthy; (b) Mosaic; (c) Red rot; (d) Rust; (e) Yellow.
Figure 1. Sample images of leaf disease: (a) Healthy; (b) Mosaic; (c) Red rot; (d) Rust; (e) Yellow.
Agriengineering 08 00031 g001
Figure 2. Proposed architecture for the sugar disease identification model.
Figure 2. Proposed architecture for the sugar disease identification model.
Agriengineering 08 00031 g002
Figure 3. Model training dynamics showing the learning curves.
Figure 3. Model training dynamics showing the learning curves.
Agriengineering 08 00031 g003
Figure 4. Confusion matrix of the proposed model: (a) raw counts and (b) normalized.
Figure 4. Confusion matrix of the proposed model: (a) raw counts and (b) normalized.
Agriengineering 08 00031 g004
Figure 5. Radar chart.
Figure 5. Radar chart.
Agriengineering 08 00031 g005
Figure 6. Uncertainty Quantification analysis.
Figure 6. Uncertainty Quantification analysis.
Agriengineering 08 00031 g006
Figure 7. Uncertainty distribution.
Figure 7. Uncertainty distribution.
Agriengineering 08 00031 g007
Figure 8. ROC curves.
Figure 8. ROC curves.
Agriengineering 08 00031 g008
Figure 9. Accuracy comparison. Comparative analysis of proposed method against existing approaches. (a) Accuracy comparison: proposed method (97.23%) outperforms uncertainty-aware approach [18] (96.5%), multimodal framework [16] (94.2%), and sugarcane-specific method [26] (86.53%). (b) Feature comparison matrix showing proposed method uniquely combines uncertainty quantification, mobile optimization, global accessibility, and real-time inference. Values: 1.0 (full support), 0.5 (partial), 0.0 (none).
Figure 9. Accuracy comparison. Comparative analysis of proposed method against existing approaches. (a) Accuracy comparison: proposed method (97.23%) outperforms uncertainty-aware approach [18] (96.5%), multimodal framework [16] (94.2%), and sugarcane-specific method [26] (86.53%). (b) Feature comparison matrix showing proposed method uniquely combines uncertainty quantification, mobile optimization, global accessibility, and real-time inference. Values: 1.0 (full support), 0.5 (partial), 0.0 (none).
Agriengineering 08 00031 g009
Figure 10. Grad CAM Analysis: (a) attention map, (b) disease-specific, and (c) single sample.
Figure 10. Grad CAM Analysis: (a) attention map, (b) disease-specific, and (c) single sample.
Agriengineering 08 00031 g010
Figure 11. Uncertainty-attention correlation.
Figure 11. Uncertainty-attention correlation.
Agriengineering 08 00031 g011
Figure 12. Web deployment screenshot.
Figure 12. Web deployment screenshot.
Agriengineering 08 00031 g012
Table 1. Comparative summary of previous studies on deep learning–based plant disease detection.
Table 1. Comparative summary of previous studies on deep learning–based plant disease detection.
StudyCrop/DatasetMethodAccuracyStrengthsLimitations
Mohanty et al. [12]plant Village (54,306 images, 38 classes)CNN (AlexNet, GoogLeNet)99.35%Large-scale validation; multiple crop speciesNo uncertainty quantification; lab-controlled images only; high computational cost
Shoaib et al. [13]Multiple crops (Review)Various CNN architecturesVariableComprehensive survey of deep learning methodsIdentifies lack of interpretability and confidence measures as key gaps
Kaushik et al. [15]PotatoDepth-wise separable adaptive DNN97.8%Lightweight architecture; adaptive learningSingle crop focus; no confidence calibration; limited interpretability
Liu & Wang [16]PepperMultimodal framework94.2%Combines visual and contextual featuresComplex multi-input pipeline; no uncertainty estimation; resource demanding
Ayyad et al. [17]TomatoPSO + YOLOv896.8%Optimized detection; real-time capabilityDetection (not classification) focus; no uncertainty quantification; requires GPU
Hernández & López [18]Plant VillageBayesian Deep Learning96.5%Uncertainty quantification capabilityComputationally expensive; complex implementation; not optimized for mobile deployment
Daphal & Koli [26]Sugarcane (Maharashtra)DenseNet86.53%Sugarcane-specific; mobile app integrationLower accuracy; no uncertainty estimates; heavy architecture
Devi et al. [27]SugarcaneDenseNet201/264 with transfer learning98.45%High accuracy; fine-tuning approachNo uncertainty quantification; computationally intensive; black-box predictions
Proposed MethodSugarcane (5 classes)MC-Dropout-MobileNetV397.23%Uncertainty quantification; lightweight (5.4 M params); interpretable (Grad-CAM); web-deployable; 2.3 s inferenceSingle-region dataset; no field validation yet
Table 2. Characteristics of the sugarcane leaf disease dataset used in this study.
Table 2. Characteristics of the sugarcane leaf disease dataset used in this study.
Disease ClassTrainsetValidation SetTest SetTotal ImagesClass Distribution (%)Image Quality
Healthy339788950617.6High
Mosaic301698745717.2Variable
Red Rot3367811653023.0High
Rust3347710351420.4High
Yellow3287611051421.8Variable
Total16383785052521100Mixed
Table 3. Model training configuration and hyperparameters.
Table 3. Model training configuration and hyperparameters.
ComponentParameterValueJustification
ArchitectureBase ModelMobileNetV3-LargeLightweight (5.4 M params) for mobile deployment
Classifier Dimensions960→1280→640→5Progressive dimensionality reduction
MC Dropout Rates0.2, 0.3Balanced uncertainty-accuracy trade-off
Data SplitTraining/Validation80%/20%Standard split for model evaluation
Batch Size32 (×2 accumulation)Memory-efficient with effective size of 64
OptimizationOptimizerAdamWSuperior weight decay regularization
Initial Learning Rate1 × 10−3Standard for transfer learning
Weight Decay1 × 10−4L2 regularization to prevent overfitting
Label Smoothing0.1Improved generalization
SchedulingLR SchedulerCosine Annealing LRSmooth convergence
T_max100 epochsFull cosine cycle
Early Stopping15 epochsPrevent overfitting
AugmentationRandom Crop Scale0.8–1.0Preserve disease features
Rotation Range±30°Natural variation
Color Jitter0.3 (B,C,S), 0.1 (H)Lighting invariance
UncertaintyMC Samples10Balance speed-reliability
Confidence Thresholds<0.4, 0.4–0.7, >0.7Low/Medium/High uncertainty bins
TrainingMax Epochs100Sufficient for convergence
Gradient Clipping1.0Stability during training
Workers2Optimal for Colab environment
Note: Dropout rates were selected through preliminary experiments on validation data (assessed: 0.1–0.5 in 0.1 increments). The asymmetric configuration (0.2, 0.3) achieved an optimal balance between classification accuracy and uncertainty-error correlation. Lower rates (<0.2) provided insufficient stochasticity for meaningful uncertainty differentiation, while higher rates (>0.4) degraded accuracy by approximately 1.5–2%. MC samples (T = 10) were selected following literature recommendations [1], achieving stable uncertainty estimates with practical inference time (2.3 s); higher values (T > 15) yielded marginal improvement (<0.1%) with proportionally increased computational cost.
Table 4. Summary of training metrics.
Table 4. Summary of training metrics.
MetricInitial
(Epoch 1)
Peak ValueFinal
(Epoch 25)
Best Model
Training Accuracy71.53%99.11% (Epoch 23)98.76%-
Validation Standard Acc85.35%95.45% (Epoch 21, 24)95.45%Epoch 21
Validation MC Accuracy86.53%95.45% (Epoch 21)95.25%Epoch 21
Training Loss0.94380.4159 (Epoch 23)0.4173-
Validation Loss0.76800.4870 (Epoch 25)0.48700.4910 (Epoch 21)
Learning Rate1.000 × 10−31.000 × 10−3 (Epoch 1)4.000 × 10−69.500 × 10−5 (Epoch 21)
Overfitting Gap13.82%3.66% (Epoch 23)3.51%3.00% (Epoch 21)
Note: The best model was selected at epoch 21 based on the highest validation MC accuracy (95.45%). The training continued through Epoch 25 without stopping early.
Table 5. Comprehensive Model Validation and Performance Analysis.
Table 5. Comprehensive Model Validation and Performance Analysis.
Validation MethodMetricValue95% CISEStatus
Cross-Validation
(5-fold)
Mean Accuracy99.13%[98.80%, 99.45%]0.12%Excellent
Standard Deviation0.26%--Very Stable
Coefficient of
Variation
0.002--Minimal Variance
Split Robustness
(5 seeds)
Mean Accuracy99.25%[98.84%, 99.65%]0.15%Highly Robust
Standard Deviation0.33%--Consistent
Bootstrap
Validation
Cross-Validation
(n = 1000)
99.13%[98.93%, 99.32%]0.10%Validated
Split Robustness
(n = 1000)
99.25%[98.97%, 99.49%]0.13%Confirmed
Final Test
Performance
MC Accuracy97.23%--High Performance
Standard Accuracy96.83%--Strong Baseline
Table 6. Disease-Specific Classification Performance.
Table 6. Disease-Specific Classification Performance.
Disease ClassPrecisionRecallF1-ScoreSupport
Healthy0.981.000.99103
Mosaic0.980.950.97105
Red Rot0.981.000.99100
Rust0.980.950.97105
Yellow0.940.960.9592
Overall0.970.970.97505
Table 7. Uncertainty Quantification and Correlation Analysis.
Table 7. Uncertainty Quantification and Correlation Analysis.
Uncertainty MetricValue95% CIStatistical Testp-ValueInterpretation
Prediction Uncertainty
Mean (Correct Predictions)0.0008---Very Low Uncertainty
Mean (Incorrect Predictions)0.0043---Higher Uncertainty
Uncertainty Separation5.38 × higher---Good Discrimination
Confidence Analysis
Mean Confidence (Correct)0.8199---High Confidence
Mean Confidence (Incorrect)0.4879---Lower Confidence
Confidence Gap0.332---Clear Separation
Correlation Testing
Uncertainty-Error Correlationr = 0.365[0.287, 0.439]t = 8.801 (df = 503)<0.001 ***Medium Effect Size
Sample Sizen = 505---Adequate Power
Note: *** p < 0.001 (two-tailed test). Stratified by confidence bins: low confidence (<0.4), accuracy = 76.5% (n ≈ 75); medium confidence (0.4–0.7), accuracy = 94.1% (n ≈ 150); high confidence (>0.7), accuracy = 98.2% (n ≈ 280).
Table 8. User-oriented interpretation of prediction confidence levels.
Table 8. User-oriented interpretation of prediction confidence levels.
Confidence LevelProbability ThresholdRecommended User ActionObserved Accuracy (%)
High>70%Proceed with the recommended treatment98.2
Medium40–70%Exercise caution; expert consultation advised94.1
Low<40%Do not act without verification; seek expert assessment76.5
Table 9. Statistical Significance Testing and Effect Size Analysis.
Table 9. Statistical Significance Testing and Effect Size Analysis.
Comparison TestTest Statisticp-ValueEffect Size95% CIInterpretation
One-Sample t-Tests
Model vs. Random Baselinet = 672.4 (df = 4)<0.001 ***Cohen’s d = 300.7-Extremely Large Effect
Model vs. Majority Classt = 648.9 (df = 4)<0.001 ***Cohen’s d = 290.2-Extremely Large Effect
Paired Comparison
MC vs. StandardW = 0.00.062Δ = 0.39%-Marginal Improvement
McNemar’s Tests
Model vs. Randomχ2 = 388.0 (df = 1)<0.001 ***--Statistically Significant (p < 0.001)
Model vs. Majority Classχ2 = 375.0 (df = 1)<0.001 ***--Statistically Significant (p < 0.001)
Note: *** p < 0.001. Effect sizes: d = Cohen’s d, Δ = mean difference.
Table 10. Ablation Study and Baseline Comparisons.
Table 10. Ablation Study and Baseline Comparisons.
Model ConfigurationAccuracy
(%)
Improvement over Random (%)Improvement over Majority (%)Key Features
Full Model (Ours)97.23+77.23+74.46MC dropout + Uncertainty
Without MC Sampling96.83+76.83+74.06Standard Inference
Random Baseline20.00-−2.77Theoretical Lower Bound
Majority Class22.77+2.77-Always Predict “Healthy”
Final Test Performance97.23+77.23+74.46Real-world Validation
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pugazhendi, P.; Badgujar, C.M.; Ganapathy, M.R.; Arumugam, M. Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3. AgriEngineering 2026, 8, 31. https://doi.org/10.3390/agriengineering8010031

AMA Style

Pugazhendi P, Badgujar CM, Ganapathy MR, Arumugam M. Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3. AgriEngineering. 2026; 8(1):31. https://doi.org/10.3390/agriengineering8010031

Chicago/Turabian Style

Pugazhendi, Pathmanaban, Chetan M. Badgujar, Madasamy Raja Ganapathy, and Manikandan Arumugam. 2026. "Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3" AgriEngineering 8, no. 1: 31. https://doi.org/10.3390/agriengineering8010031

APA Style

Pugazhendi, P., Badgujar, C. M., Ganapathy, M. R., & Arumugam, M. (2026). Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3. AgriEngineering, 8(1), 31. https://doi.org/10.3390/agriengineering8010031

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop