Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3

Pugazhendi, Pathmanaban; Badgujar, Chetan M.; Ganapathy, Madasamy Raja; Arumugam, Manikandan

doi:10.3390/agriengineering8010031

Open AccessArticle

Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3

by

Pathmanaban Pugazhendi

¹

,

Chetan M. Badgujar

^2,*,

Madasamy Raja Ganapathy

³

and

Manikandan Arumugam

⁴

¹

Department of Automobile Engineering, Easwari Engineering College, Chennai 600089, Tamil Nadu, India

²

Biosystems Engineering and Soil Science, The University of Tennessee, Knoxville, TN 37996, USA

³

Department of Information Technology, Paavai Engineering College, Namakkal 637018, Tamil Nadu, India

⁴

Department of Computer Science and Engineering, Paavai Engineering College, Namakkal 637018, Tamil Nadu, India

^*

Author to whom correspondence should be addressed.

AgriEngineering 2026, 8(1), 31; https://doi.org/10.3390/agriengineering8010031

Submission received: 19 November 2025 / Revised: 16 December 2025 / Accepted: 24 December 2025 / Published: 16 January 2026

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Sugarcane diseases cause estimated global annual losses of over $5 billion. While deep learning shows promise for disease detection, current approaches lack transparency and confidence estimates, limiting their adoption by agricultural stakeholders. We developed an uncertainty-aware detection system integrating Monte Carlo (MC) dropout with MobileNetV3, trained on 2521 images across five categories: Healthy, Mosaic, Red Rot, Rust, and Yellow. The proposed framework achieved 97.23% accuracy with a lightweight architecture comprising 5.4 M parameters. It enabled a 2.3 s inference while generating well-calibrated uncertainty estimates that were 4.0 times higher for misclassifications. High-confidence predictions (>70%) achieved 98.2% accuracy. Gradient-weighted Class Activation Mapping provided interpretable disease localization, and the system was deployed on Hugging Face Spaces for global accessibility. The model demonstrated high recall for the Healthy and Red Rot classes. The model achieved comparatively higher recall for the Healthy and Red Rot classes. The inclusion of uncertainty quantification provides additional information that may support more informed decision-making in precision agriculture applications involving farmers and agronomists.

Keywords:

sugarcane diseases; deep learning; uncertainty quantification; Monte Carlo dropout; precision agriculture; disease detection

1. Introduction

Sugarcane (Saccharum spp. hybrids), an important cash crop, is cultivated across more than 100 countries, including tropical and subtropical regions, covering approximately 27 million hectares globally [1]. Sugarcane produces a variety of products and byproducts, not limited to sugar, bioethanol, molasses, and bagasse. Annual global output of primary product sugar reached 1.95 billion tons, generating $75 billion in economic value (i.e., 80% of global supply) in 2024 [2]. Brazil dominates global sugarcane production, with an annual output of 752.9 million tons, followed by India at 405.4 million tons. Together, these two nations account for over 60% of worldwide sugarcane production [3]. The multifaceted economic importance of sugarcane extends beyond sugar production, as pressed cane juice can be used to produce diesel, jet fuel, and other high-value products, while by-products serve in power generation, fertilizers, and agricultural substrates [4].

Despite its economic significance, the sugarcane industry faces substantial challenges due to crop diseases that threaten its productivity and sustainability. Plant-parasitic nematodes are a significant constraint in sugarcane production and can lead to a loss of up to 30% in productivity [5]. The economic impact is staggering, with the sugarcane sector having a market value of more than $56 billion worldwide. If the average productivity loss of sugarcane owing to parasitic nematodes is 10%, then the economic loss is estimated to be more than $5 billion annually [6]. Major plant diseases, including mosaic, rust, red rot, and yellow leaf syndrome, can reduce sugarcane yield by 10–50%, or even 60–80% in severe cases, causing significant losses across producing regions [7]. Traditional disease detection methods rely heavily on visual inspection by domain experts, a process that is labor-intensive, subjective, and often impractical for small-scale farmers in developing regions, where sugarcane cultivation supports millions of livelihoods [8].

The evolution of disease detection methodologies has witnessed a change in thinking with the emergence of computer vision and machine learning technologies [9,10,11]. Smartphone proliferation and deep learning advances enable field-deployable disease diagnosis systems [12]. Deep learning approaches have been remarkably successful in agricultural applications. These techniques, which are based on convolutional neural networks (CNNs), have been successful in achieving high accuracy in identifying various plant lesions from images [13]. The application of CNNs has revolutionized plant pathology, enabling automated feature extraction and classification that surpasses traditional hand-engineered approaches in terms of accuracy and efficiency [14]. Recent studies have demonstrated the effectiveness of deep learning across diverse crop species: depth-wise separable adaptive neural networks achieved robust detection of potato diseases [15], multimodal frameworks combining visual and contextual information improved pepper disease and pest identification [16], and particle swarm optimization with YOLOv8 enhanced tomato plant disease detection performance [17].

Despite technical advances, the adoption of agricultural artificial intelligence (AI) remains below 15% in developing regions [18]. Key identified barriers include, but are not limited to, (1) the absence of confidence measures in predictions, (2) computational requirements exceeding field devices, and (3) lack of interpretability for non-technical users.

The deployment of deep learning models in resource-constrained agricultural environments presents additional challenges for researchers [19]. Traditional and state-of-the-art models have demonstrated good accuracy, but their practicality as end-user solutions remains uncertain owing to current resource limitations [20]. Mobile deployment requires careful consideration of computational constraints, as large models often demand resources that are unavailable under field conditions [21]. The development of lightweight architectures, such as MobileNet, has addressed some of these concerns, with MobileNetV2 for the real-time detection of plant diseases using images obtained from smartphones [22]. Farmer adoption of AI technology faces multifaceted barriers beyond technical limitations. There are multiple barriers to scaling from an industry perspective, such as fragmentation, lack of standard data architecture, and cross-platform interoperability [23]. Economic constraints are paramount, with 47% of respondents citing cost as a top concern, while trust issues stem from concerns about data ownership, algorithmic transparency, and the perceived disconnect between technology developers and agricultural practitioners [24].

Although deep learning has transformed plant disease detection, most existing systems operate as black-box classifiers that provide categorical outputs without confidence. Although such models are accurate under laboratory conditions, they often fail to deliver trustworthy and interpretable decisions in real-world fields. The absence of uncertainty quantification prevents farmers from assessing the reliability of predictions, whereas the high computational demands of conventional architectures, such as ResNet or DenseNet, limit their deployment on mobile or edge devices commonly used in agriculture [25]. These constraints create a persistent gap between algorithmic accuracy and practical usability in precision farming applications.

This study addresses the following research questions:

RQ1: Can uncertainty quantification be effectively integrated into lightweight deep learning architectures for plant disease detection without compromising classification accuracy or computational efficiency?

RQ2: Does prediction uncertainty correlate with classification errors, enabling reliable identification of ambiguous cases that require expert verification?

RQ3: Can an uncertainty-aware disease detection system be deployed on accessible web platforms while maintaining practical inference times for real-world agricultural applications?

To address these research questions, the present study introduces a lightweight, uncertainty-aware sugarcane disease detection framework that integrates Monte Carlo (MC) dropout uncertainty quantification with a MobileNetV3-Large backbone. The main contributions of this study are as follows:

(1) We develop a novel MC-Dropout-MobileNetV3 architecture that provides calibrated confidence estimates without additional parameters or architectural modifications, achieving 97.23% accuracy while maintaining a lightweight footprint (5.4 M parameters) suitable for resource-constrained deployment.

(2) We demonstrate that prediction uncertainty effectively discriminates between reliable and unreliable classifications, with misclassified samples exhibiting 5.38-fold higher uncertainty than correct predictions, enabling risk-stratified decision-making in agricultural applications.

(3) We integrate Gradient-weighted Class Activation Mapping (Grad-CAM) visualization to provide interpretable attention maps highlighting disease-relevant regions, thereby enhancing transparency and user trust in model predictions.

(4) We deploy the complete system on Hugging Face Spaces as a publicly accessible web platform, achieving practical inference times (2.3 s for 10 MC passes) and demonstrating real-world deployment feasibility for global accessibility.

Overall, this study establishes a deployable, interpretable, and uncertainty-aware AI framework for risk-informed disease management in precision agriculture. Table 1 summarizes representative studies in deep learning-based plant disease detection, highlighting their methodological approaches, performance metrics, and key limitations. While existing methods have achieved high classification accuracy, most lack uncertainty quantification capabilities, limiting their trustworthiness in practical agricultural decision-making. Additionally, many state-of-the-art architectures require substantial computational resources, restricting the deployment of resource-constrained devices commonly available to farmers.

2. Materials and Methods

2.1. Data Description

This study utilized the open-source Sugarcane Leaf Disease Dataset [26]. The data were collected from agricultural fields across the state of Maharashtra, India, a region accounting for 35% of India’s sugarcane production. The dataset represents diverse agro-climatic conditions typical of subtropical growing regions, facilitating natural disease occurrenceImages were captured using consumer-grade smartphones (8–48 megapixels) during daylight hours (8:00 AM–5:00 PM) at distances of 10–50 cm from the leaf surfaces. This methodology simulated practical field-deployment conditions with realistic lighting and positioning variations. The dataset included a total of 1638 images evenly distributed among the five disease classes. The detailed dataset statistics are presented in Table 2, and sample images are shown in Figure 1. Each disease class represented pathologically confirmed infections, as verified by plant pathology experts. The image processing step included the removal of blurred images, exclusion of multi-symptom samples, and laboratory verification of ambiguous diagnoses.

2.2. Data Preprocessing and Augmentations

The preprocessing steps included standardizing the input image size while preserving disease-relevant features for an accurate classification. The images were resized to 224 × 224 pixels using bilinear interpolation to match the classification model (i.e., MobileNetV3) requirements while maintaining computational efficiency. Moreover, the images were normalized using standard ImageNet normalization (μ = [0.485, 0.456, 0.406], σ = [0.229, 0.224, 0.225]). Data augmentation strategies can increase the dataset diversity for the training subset and significantly improve the model performance [28]. The augmentation transformations were applied randomly, which included rotations (±10°), horizontal/vertical translations (±10% of dimensions), horizontal flipping (50% probability), zoom scaling (0.9–1.1), and brightness adjustments (±10%). Color manipulations were avoided to preserve diagnostic signatures, such as rust pustule orange hues or red rot discoloration. Augmentation was implemented using TensorFlow’s preprocessing layers for on-the-fly processing, applied only to the training data to provide diverse and challenging subsets for the model development.

2.3. Model Architecture Development

In this study, we propose a hybrid architecture (Figure 2) that combines the MobileNetV3-Large backbone with a custom uncertainty-aware classification head optimized for inference efficiency. The MC dropout MobileNetV3 architecture integrates uncertainty quantification directly into the inference pipeline without architectural modifications or computational overheads. MobileNetV3-Large was used as the feature extractor because of its optimal balance between accuracy and deployment efficiency [29]. The custom classifier head replaces the standard dropout with MC dropout layers that remain active during inference through explicit training mode activation. This design enables Bayesian approximation through multiple stochastic forward passes, transforming the deterministic classifier into an uncertainty-aware system. The classifier progressively reduced the dimensionality from 960 input features through intermediate representations (1280, 640) to five output classes. The hard-swish and ReLU activation functions were used, which provide nonlinearity while maintaining the gradient flow. The dual dropout configuration (rates specified in Table 2) creates multiple stochastic sources that are essential for a robust uncertainty estimation. MobileNetV3-Large was chosen because it provides state-of-the-art accuracy while maintaining low computational complexity and fast inference, which is suitable for smartphones and edge hardware [30,31]. Unlike heavier architectures (ResNet or DenseNet), it achieves an efficient feature representation with minimal latency, allowing real-time deployment for field-based sugarcane disease detection. Furthermore, its modular structure enables the seamless integration of an MC dropout-based uncertainty head, resulting in a hybrid model that combines efficiency, interpretability, and confidence awareness within a single unified framework.

2.4. Model Training Strategy

Model optimization was designed to ensure stable convergence and efficient training within limited computational resources (Table 3). Gradient accumulation was used to simulate larger effective batch sizes under memory constraints, thereby doubling the batch size without additional graphics processing unit (GPU) usage. The AdamW optimizer, which incorporates decoupled weight decay, was employed to achieve improved regularization and prevent overfitting compared with the conventional Adam algorithm. Label smoothing regularization was applied to soften the target distributions, minimize overconfidence in the model predictions, and contribute to well-calibrated uncertainty estimates of the model. A cosine annealing learning rate schedule was implemented to gradually reduce the learning rate following a cosine decay pattern, thereby enabling smooth convergence while avoiding premature stagnation in the local minima. Early stopping was configured with adaptive patience to monitor the validation accuracy, automatically saving the optimal model checkpoint based on MC validation performance rather than standard deterministic accuracy, thereby emphasizing stability and uncertainty-aware optimization throughout training.

2.5. Uncertainty Quantification Framework

The MC dropout was adopted for uncertainty quantification because it introduces no extra parameters yet yields calibrated confidence estimates. During inference, the model performs T stochastic forward passes with dropout activation, thereby generating a distribution of predictions for each input sample. The mc predict method orchestrates this process by aggregating softmax probability outputs across all stochastic passes.

Let

f (x; θ t)

represent the model output for input

x

under the dropout mask

θ t

.

The predictive distribution is obtained as

p (y∣ x) \approx \frac{1}{T} \sum_{t - 1}^{T} T f (x; θ t)

(1)

The predictive mean represents the expected class probability, while the predictive variance across the

T

samples measures model uncertainty:

V a r [p (y∣ x)] = \frac{1}{T} \sum_{t - 1}^{T} T {(f (x; θ t) - \bar{p} (y∣ x))}^{2}

(2)

where the parameters are defined as follows:

x

represents the input image to be classified;

y

denotes the predicted class label (one of five disease categories: Healthy, Mosaic, Red Rot, Rust, or Yellow);

T

is the total number of MC forward passes (set to 10 in this study, balancing computational efficiency with uncertainty estimation reliability);

θ t

represents the stochastic dropout mask applied during the

t

_th forward pass, where neurons are randomly deactivated according to the specified dropout rates (0.2 and 0.3 for the two dropout layers);

f (x; θ t)

denotes the softmax probability output vector of the model for input

x

under dropout mask

θ t

;

p (y∣ x)

represents the averaged predictive probability distribution over all

T

stochastic passes; and

\bar{p} (y∣ x)

is the mean predictive probability used as the reference for variance calculation. Prediction confidence is computed as the normalized inverse of uncertainty, indicating how stable the model’s outputs are across stochastic passes. This MC dropout approach introduces no architectural modifications or additional parameters, making it computationally efficient and suitable for deployment on resource-limited devices such as mobile phones. The quantified uncertainty serves two essential purposes: (1) identifying ambiguous or unreliable predictions that require expert verification. (2) Calibrating model confidence for risk-aware and interpretable decision-making. The continuous uncertainty values were categorized into three confidence tiers: (1) low (<0.4), (2) medium (0.4–0.7), and (3) high (>0.7), providing actionable guidance for agricultural practitioners in field applications.

2.6. Grad-CAM Interpretability

Grad-CAM implementation provides visual explanations for model predictions, which is crucial for building practitioner trust. The system registers forward and backward hooks on the final convolutional layer of MobileNetV3 to capture the activations and gradients during inference. The global average pooling of gradients generates importance weights, which are combined with activations to produce class-discriminative localization maps. Heatmaps underwent ReLU activation to retain only positive influences, followed by normalization for consistent visualization. The overlay process (α = 0.4) preserved the original image details while highlighting disease-relevant regions, enabling the visual verification of model attention patterns.

2.7. Validation Framework

The robustness of the model was assessed using multiple validation strategies. Stratified k-fold cross-validation was employed to maintain the class distributions across folds and evaluate consistency beyond single train-test splits. Each fold preserved the original model weights, assessing the generalization across different data partitions. Ablation studies systematically remove components to validate architectural contributions. Comparing full MC inference against single forward passes quantified the uncertainty value. Random and majority-class baselines established performance floors, confirming that the learned representations exceeded trivial solutions. Data leakage verification employed multiple random seeds for train-test splitting to assess whether the performance remained consistent across different data partitions. Significant performance variations would indicate overfitting to specific splits rather than genuine pattern learning.

2.8. Performance Analysis

The evaluation framework distinguishes between standard and MC accuracy, capturing both deterministic and probabilistic performances. Uncertainty analysis examined the correlations between prediction confidence and correctness, validating that higher uncertainty coincided with increased error probability. The classification reports provided per-class precision, recall, and F1-scores, identifying disease-specific strengths and weaknesses. Statistical significance testing through McNemar’s test compared model variants, while bootstrap confidence intervals quantified the performance uncertainty. The correlation coefficient between uncertainty and prediction errors validated the reliability of the confidence estimates for downstream decision-making.

2.9. Implementation Details

The system was implemented in PyTorch (2.8), leveraging GPU (NVIDIA A100) acceleration when available, while maintaining CPU compatibility for the deployment scenarios. The modular design separates the model definition, training logic, and evaluation components, facilitating maintenance and extension. Progress tracking through tqdm provides real-time training feedback, and comprehensive logging captures the metrics for post hoc analysis. Memory optimization techniques, including gradient accumulation, cache clearing, and pin memory usage, enabled training in resource-constrained environments. The final model checkpoint preserves not only the weights but also the optimizer state, training configuration, and class mappings, ensuring complete reproducibility.

3. Results

3.1. Training Dynamics and Model Convergence

The MC- Dropout- MobileNetV3 model converged at epoch 21, with a validation accuracy of 95.45%, demonstrating effective transfer learning from the pretrained backbone (Table 4 and Figure 3). The substantial improvement in training accuracy (71.53% to 98.76%), coupled with modest validation gains (86.53% to 95.25%), suggests that the ImageNet weights provided strong initial feature representations, requiring minimal adaptation to sugarcane disease patterns. The consistent 0.4–0.6% superiority of MC validation over standard validation confirms that uncertainty-aware inference improves prediction quality even during training. Training was conducted on Google Colaboratory using a CUDA-enabled GPU with PyTorch 2.8+. The computational environment provided sufficient resources for efficient training, with an average processing time of approximately 40–42 s per epoch, resulting in a total training duration of 17.5 min for 25 complete epochs.

3.2. Overall Classification Performance

The model achieved 97.23% MC accuracy on the held-out test set (n = 505), with uncertainty-aware inference providing a 0.4 percentage point improvement over standard deterministic predictions (96.83%). This positions the system competitively among sugarcane disease detection models in the literature, while uniquely providing calibrated uncertainty estimates (Table 5).

Rigorous validation through 5-fold cross-validation yielded a 99.13% mean accuracy (95% CI: 98.80–99.45%) with remarkably low variance (CV: 0.002), although the 2% gap between cross-validation (99.13%) and test set (97.23%) warrants investigation. The narrow confidence interval and minimal coefficient of variation suggest that the model performance is not dependent on specific training-validation splits. Split robustness testing across five random seeds further confirmed this stability (99.25%, SD = 0.33%), suggesting consistent performance or random initialization effects.

The confusion matrix (Figure 4) reveals a strong diagonal dominance with minimal off-diagonal misclassifications. The 1.9% gap between the cross-validation and test performance suggests a minor distribution shift between the validation folds and the final test set, although both metrics remain well above the practical deployment thresholds. Bootstrap validation (n = 1000) confirmed that these results were not statistical artifacts, with tight confidence bounds validating the reported performance levels.

3.3. Disease-Specific Performance Analysis

The model achieved high recall for the Healthy (1.00) class and high precision for the Mosaic (1.00) class based on 89 and 87 test samples, respectively. Red Rot showed balanced performance with both precision and recall at 0.97 on 116 test samples, the largest class in the test set, indicating strong reliability for these categories, minimizing false positives that could lead to unnecessary treatments and supporting the detection of severe fungal infections (Table 6). This performance is particularly significant, given the economic implications of misdiagnosis in commercial sugarcane cultivation.

A key finding was the correlation between prediction confidence and clinical impact. High-confidence predictions (mean: 0.82) for Healthy and Red Rot samples aligned with their high recall, whereas medium confidence levels for yellow disease (0.49–0.65) appropriately reflected greater diagnostic uncertainty. This calibrated confidence enables risk-aware decision-making, allowing farmers to seek expert validation in uncertain cases.

The radar chart visualization (Figure 5) confirmed a uniformly high performance across all disease classes, with no single disease dominating or underperforming significantly. All metrics exceeded 0.94, indicating that the model learned robust discriminative features for each disease category rather than overfitting to specific visual patterns. The balanced performance profile suggests that the model is ready for comprehensive field deployment across all five disease categories, rather than requiring disease-specific models.

3.4. Uncertainty Analysis

The MC dropout mechanism differentiated between reliable and unreliable predictions, with incorrect classifications exhibiting higher average uncertainty than correct predictions. This trend supports the hypothesis that uncertainty estimates can help flag potential errors (Table 7). The confidence gap of 0.332 between correct (0.82) and incorrect (0.49) predictions demonstrated well-calibrated uncertainty estimates. The statistically significant correlation (r = 0.365, p < 0.001) between uncertainty and prediction errors, while representing a medium effect size, provides a sufficient signal for practical risk stratification. The four-panel uncertainty analysis (Figure 6) revealed that high-confidence predictions (>0.7) achieved 98.2% accuracy, whereas low-confidence predictions (<0.4) dropped to 76.5%, confirming that confidence thresholds can effectively guide trust in model outputs. The calibration curve showed slight under-confidence at high probability ranges, which is a desirable characteristic for safety-agricultural applications, where conservative predictions prevent expensive misdiagnoses. The distinct uncertainty distributions for correct and incorrect predictions (Figure 7) exhibited minimal overlap, enabling practical threshold selection for deployment scenarios. This separation allows the system to automatically flag approximately 15% of the predictions for expert review while maintaining >99% accuracy on the remaining high-confidence cases, enabling selective expert review.

To enhance interpretability for agricultural practitioners, the model’s continuous uncertainty estimates were discretized into three actionable confidence tiers (Table 8). This mapping enables non-expert users to translate probabilistic outputs into clear decision guidance for field-level interventions.

For instance, when a farmer uploads an image of a leaf exhibiting clear rust symptoms, the system may return: “Rust disease detected (high confidence: 85%). Recommended action: apply the appropriate fungicide.” In contrast, for visually ambiguous samples with overlapping symptom patterns, the system provides a conservative response, such as: “Possible yellow disease detected (low confidence: 35%). The prediction is uncertain; please acquire a clearer image or consult an agricultural extension officer before proceeding.” This confidence-based interpretation framework bridges the gap between probabilistic model outputs and practical decision-making, thereby improving trust, usability, and risk-aware adoption in real-world agricultural settings.

3.5. Statistical Validation and Significance Testing

Statistical validation confirmed the model’s overwhelming superiority over baseline approaches, with McNemar’s test yielding χ² = 388.0 (p < 0.001) against random classification and χ² = 375.0 (p < 0.001) against majority class prediction (Table 9).

The model demonstrated large effect sizes relative to the baseline classifiers, supporting the hypothesis that the learned representations captured genuine disease patterns. The relatively short training time resulted from efficient transfer learning with ImageNet pre-trained weights, only fine-tuning rather than training from scratch. Receiver Operating Characteristic (ROC) analysis demonstrated a high discrimination capability across all disease classes, with AUC values exceeding 0.99 for each category (Figure 8). These statistical validations collectively demonstrate that the model’s performance is not only practically significant but also statistically robust, with effect sizes and significance levels far exceeding publication thresholds in all comparative analyses.

3.6. Model Comparisons and Ablation Study

The ablation study revealed that MC dropout contributed marginally to raw accuracy (97.23% vs. 96.83% without MC sampling), representing a 0.40 percentage point improvement. However, this modest accuracy gain understates MC dropout’s primary value: transforming a black-box classifier into an uncertainty-aware diagnostic system capable of identifying unreliable predictions (Table 10). The performance comparison visualization (Figure 9) illustrates the model’s advantage, with confidence intervals for the proposed approach entirely separated from the baseline methods. The minimal overlap between standard and MC-enhanced predictions suggests that uncertainty quantification can be added to existing agricultural AI systems with negligible computational overhead while providing substantial value through risk-aware predictions. Notably, the ablation results validate the architectural choices; removing any component (dropout layers, custom head, or transfer learning) resulted in a performance degradation exceeding 5%, confirming that each element contributes meaningfully to the effectiveness of the final system.

3.7. Model Interpretability

Grad-CAM visualization confirmed that the model focused on disease-relevant leaf regions rather than background artifacts or spurious correlations (Figure 10). Attention heatmaps consistently highlighted symptomatic areas, including rust pustules, mosaic patterns, red rot discoloration, and yellow patches, while largely ignoring healthy tissue and image backgrounds. This targeted attention validates that the CNN learned diagnostically meaningful features rather than dataset-specific biases.

Disease-specific attention patterns revealed distinct focus strategies for each condition in the study. Red Rot detection focused on stem-leaf junction regions, where fungal invasion typically initiates, whereas mosaic classification focused on the characteristic light-dark striping patterns across leaf blades. Rust detection focused on small, dispersed regions corresponding to individual pustules, demonstrating the model’s ability to identify fine-grained symptoms. The yellow disease attention maps showed more diffuse patterns, reflecting the systemic nature of nutritional disorders, which aligned with the slightly lower confidence scores for this class.

A notable finding emerged from the correlation between the attention intensity and prediction uncertainty (Figure 11). High-uncertainty predictions exhibited scattered, inconsistent attention patterns (mean attention coherence: 0.42), whereas confident predictions showed focused, concentrated activation on specific symptomatic regions (coherence: 0.81). This relationship suggests that the model’s uncertainty appropriately reflects the ambiguity in visual features rather than random variability. The single-sample analysis demonstrated that for misclassified cases, attention often focused on disease-relevant regions but with lower intensity or split between multiple disease characteristics, providing interpretable explanations for the model errors. This interpretability enables agricultural experts to understand and potentially correct model predictions, thereby building trust essential for practical adoption.

3.8. Web Platform Deployment

The model was deployed via Hugging Face Spaces, achieving a 2.3 ± 0.5 s response time for complete inference (10 MC passes). The interface provides predictions with confidence scores and probability distributions. The system maintained a 99.2% uptime during the three-month testing period (Figure 12). The 2.3 s inference time for 10 MC passes represents a practical trade-off for the intended agricultural use. The primary deployment scenario involves individual leaf diagnosis for treatment decisions rather than high-throughput screening. A farmer examining suspicious plants typically spends considerably more time photographing leaves and interpreting results than the inference duration. For comparison, Deep Ensembles, the primary alternative uncertainty method, would require approximately 11.5 s (5× inference time) for a comparable uncertainty quality. For users requiring faster throughput, a single forward-pass inference (without MC dropout) achieves 0.23 s response with 96.83% accuracy, although without any uncertainty estimates. Agricultural decision-making operates on hourly and daily timescales, whether to apply treatment today versus tomorrow, rather than requiring millisecond-level responses. Regarding deployment accessibility, the current Hugging Face Spaces implementation requires internet connectivity, which is increasingly available in Indian sugarcane-growing regions with >85% mobile penetration in rural Maharashtra, India. The lightweight architecture (5.4 M parameters, ~22 MB model file) is specifically designed to enable future offline deployment on smartphones, representing a planned development direction. The web interface was designed with non-technical users in mind, featuring a single-action workflow (upload image → receive result), color-coded confidence indicators (green for high, yellow for medium, red for low confidence), plain-language recommendations avoiding technical jargon, and no Registration requirements. Successful large-scale deployment would benefit from integration with existing agricultural extension services, including brief training sessions (15–30 min) demonstrating image capture best practices and the interpretation of the results.

4. Discussion

4.1. Performance Analysis and Benchmarking

The proposed MC-Dropout-MobileNetV3 system achieved 97.23% accuracy on the held-out test set, with uncertainty-aware inference providing a marginal 0.4% improvement over the standard deterministic predictions. This performance is competitive with recent sugarcane disease detection studies, although direct comparison remains challenging owing to dataset variability. Howard et al. [26] reported 86.53% accuracy using DenseNet on a similar dataset, while [31] achieved 98.45% using enhanced DenseNet architectures. However, neither system provided uncertainty quantification, which is the primary contribution of this study. The 1.9% gap between the cross-validation performance (99.13%) and test set accuracy (97.23%) suggests mild overfitting despite the regularization strategies used. This discrepancy likely stems from the limited dataset size (2521 images) and the single-region collection. The high cross-validation accuracy with a low variance (SD: 0.26%) indicates that the model learns consistent patterns within the dataset, but generalization to truly unseen data remains untested.

4.2. Uncertainty Quantification Insights

The correlation between prediction uncertainty and error probability (r = 0.365, p < 0.001) represented a medium effect size, sufficient for practical risk stratification but not strong enough for complete reliance on uncertainty estimates alone. The 5.38-fold higher uncertainty for misclassifications shows the discrimination capability, although the overlapping distributions suggest that uncertainty thresholds require careful calibration for deployment scenarios. The observation that high-confidence predictions (>0.7) achieved 98.2% accuracy, whereas low-confidence predictions (<0.4) dropped to 76.5%, validates the utility of confidence-based triage. In practical terms, this enables automated processing for approximately 85% of cases, while flagging 15% for manual review. However, the absence of proper calibration metrics (Expected Calibration Error, Maximum Calibration Error) limits our understanding of whether confidence scores accurately represent true probabilities.

This scenario illustrates the practical value of uncertainty quantification, which extends beyond raw accuracy improvements. Scenario: Pesticide Application Decision. Consider a farmer observing leaf discoloration that could indicate either yellow disease (nutrient deficiency, treatment: fertilizer adjustment, cost: ~$20/hectare) or early-stage Red Rot (fungal infection, treatment: fungicide application, cost: ~$80/hectare + potential crop loss if untreated). Without uncertainty quantification, a deterministic model predicts “Yellow Disease” with no confidence. Farmers apply fertilizers. If the prediction was incorrect and the actual condition was Red Rot, the fungal infection spread, potentially causing a 30–50% yield loss. With our uncertainty-aware system, the model predicts “Yellow Disease” but with LOW confidence (38%), indicating that the prediction is unreliable. Farmers are advised to seek expert verification. Upon closer examination by an agronomist, the condition was correctly identified as early Red Rot, enabling timely fungicide application and preventing significant crop loss. This scenario shows that the 0.4% accuracy improvement from the MC dropout understates its practical value; the critical contribution is knowing when not to trust the prediction, enabling risk-aware decision-making that prevents costly misdiagnoses.

4.3. Architectural Considerations

MobileNetV3-Large provided an effective backbone with only 5.4 M parameters, achieving performance comparable to larger models while maintaining deployment feasibility. The 2.3 s inference time for 10 MC passes represents a reasonable trade-off between uncertainty quality and response time. However, this study did not explore alternative uncertainty quantification methods, such as deep ensembles, temperature scaling, or evidential deep learning, which might provide better-calibrated uncertainties with different computational trade-offs. The dual dropout configuration (0.2, 0.3) was selected through a limited grid search; however, more sophisticated approaches, such as learned dropout rates or spatially adaptive dropout, might improve uncertainty estimates. Additionally, the fixed number of MC samples (T = 10) represents another hyperparameter that can be optimized based on the uncertainty convergence rate. Although this study utilized MC dropout for uncertainty quantification, alternative methodologies exist, each with distinct trade-offs. Deep Ensembles [32], which involve training multiple independent networks (typically five) and aggregating their predictions, offer superior calibration but necessitate a fivefold increase in model parameters and inference time. This requirement renders them impractical for mobile deployment, as our 5.4 M parameter model would expand to 27 M. Bayesian Neural Networks [33] provide principled uncertainty quantification through weight distributions but require specialized variational inference training, double the parameters (to store both the mean and variance per weight), and often exhibit optimization instability. Evidential Deep Learning [34] facilitates single-pass uncertainty estimation via Dirichlet parameterization but requires specialized loss functions and meticulous hyperparameter tuning, with limited validation in agricultural contexts. MC dropout was chosen because of its zero additional parameter introduction, lack of architectural modifications, and seamless integration into existing training pipelines, while offering well-calibrated uncertainty estimates suitable for resource-constrained deployment [35]. A systematic empirical comparison of these methods using agricultural data sets constitutes a promising direction for future research.

4.4. Interpretability Analysis

Grad-CAM visualizations confirmed that the model focused on disease-relevant regions with distinct attention patterns for each condition. The correlation between attention coherence and prediction confidence provides an interpretable signal for the sources of uncertainty. However, the qualitative nature of this analysis limits its rigor. Future work should employ quantitative metrics for attention consistency and explore whether attention-based uncertainty measures could complement MC dropout. The observation that misclassified cases often showed split attention between multiple disease characteristics suggests that the model struggled with ambiguous visual presentations. This finding highlights the importance of uncertainty quantification in cases where visual symptoms overlap between diseases.

4.5. Study Limitations

We also identified a few limitations of this study, including (1) Geographic constraints: The dataset is exclusively representative of Maharashtra, India, which may introduce regional bias and affect generalizability. The sugarcane varieties cultivated in Maharashtra, such as Co 86032 and CoM 0265, differ from those in Brazil (e.g., RB varieties) or Australia, and the presentation of disease symptoms can vary with local agroclimatic conditions, including humidity, temperature, and pathogen strain. However, the uncertainty quantification mechanism partially mitigated this limitation. When the model encounters unfamiliar regional characteristics, an increase in prediction uncertainty is anticipated, thereby automatically flagging potentially unreliable classifications for expert review, rather than producing overconfident errors. Nonetheless, pilot validation with local data is recommended prior to deployment in new agroclimatic zones in the future. (2) Dataset size and diversity: The data used in this study only included 2521 images across five categories, which may not capture the full spectrum of the disease. (3) The study lacks field validation because the model performance can vary under different field conditions, including variable lighting, camera angles, or image quality. Additionally, the model has not been validated under challenging field conditions, including strong direct sunlight, shadows from adjacent plants, moisture on leaf surfaces, or partially occluded leaves. Although the uncertainty quantification mechanism is expected to flag such challenging inputs with increased prediction uncertainty, systematic evaluation under these specific conditions remains essential for future work.

Despite these limitations, this study shows that uncertainty quantification can be integrated into agricultural AI systems without a significant computational burden. Web deployment shows technical feasibility, although adoption barriers extend beyond technology. Economic constraints, digital literacy, and trust in AI recommendations must be considered for successful deployment. The potential value of the system lies in preventing both under-treatment (missing diseases) and over-treatment (unnecessary pesticide application).

4.6. Future Directions

Several research directions could address the current limitations, including (1) dataset expansion with multi-region data collection across diverse sugarcane varieties and environmental conditions to improve generalizability. (2) A comparative evaluation of uncertainty methods (ensembles, evidential networks, conformal prediction) could identify optimal approaches for agricultural applications. (3) Participatory research with farmers examining trust, usability, and the decision-making impact of uncertainty estimates is essential. Longitudinal studies tracking disease management outcomes will provide evidence of practical value. (4) The integration potential of combining visual diagnosis with environmental data (weather, soil conditions), and historical disease patterns could improve prediction confidence and provide context-aware recommendations

5. Conclusions

This study integrated MC dropout uncertainty quantification into a MobileNetV3-based sugarcane disease detection system, achieving a test accuracy of 97.23% across five disease categories. The primary contribution lies in demonstrating that prediction confidence can be quantified in agricultural AI applications without sacrificing accuracy or requiring additional model parameters. The system successfully discriminated between reliable and unreliable predictions, with incorrect classifications showing a 5.38-fold higher uncertainty than correct predictions. The correlation between uncertainty and error probability (r = 0.365, p < 0.001) enabled risk-stratified decision-making, with high-confidence predictions achieving 98.2% accuracy. This allows automated processing for most cases while identifying approximately 15% that require expert review. The lightweight architecture (5.4 M parameters) and web deployment demonstrate the technical feasibility of resource-constrained agricultural settings. However, significant limitations constrain their current applicability. The geographically restricted dataset from Maharashtra, India, prevents claims of broader generalizability of the results. The absence of field validation with agricultural practitioners limits the understanding of practical utility and adoption potential. These constraints must be addressed before large-scale deployment. Future research should prioritize multi-region data collection, comparative evaluations of uncertainty quantification methods, and participatory field trials with farmers. The integration of environmental context and temporal disease progression could enhance the early detection capabilities. While this study establishes technical feasibility, translating uncertainty-aware AI into trusted agricultural practices requires addressing socio-technical factors beyond algorithmic performance. The framework presented in this study provides a foundation for uncertainty-aware plant disease detection systems. As agricultural AI continues to evolve, incorporating prediction confidence alongside accuracy will be essential for building practitioner trust and enabling risk-aware disease management decisions.

Author Contributions

Conceptualization, P.P. and C.M.B.; methodology, P.P.; software, P.P.; validation, P.P., M.R.G. and M.A.; formal analysis, C.M.B.; investigation, P.P.; resources, P.P.; data curation, C.M.B.; writing—original draft preparation, P.P. and C.M.B.; writing—review and editing, C.M.B. and M.R.G.; visualization, M.A.; supervision, M.A.; project administration, C.M.B.; funding acquisition, C.M.B. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for open access to this research was provided by University of Tennessee’s Open Publishing Support Fund.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset generated during this study is publicly available at https://www.kaggle.com/datasets/nirmalsankalana/sugarcane-leaf-disease-dataset (accessed on 20 September 2025). Code availability: https://github.com/pathmanaban86/uncertainty_sugarcane_classifier (accessed on 20 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Budeguer, F.; Enrique, R.; Perera, M.F.; Racedo, J.; Castagnaro, A.P.; Noguera, A.S.; Welin, B. Genetic Transformation of Sugarcane, Current Status and Future Prospects. Front. Plant Sci. 2021, 12, 768609. [Google Scholar] [CrossRef] [PubMed]
Dukhnytskyi, B. World Agricultural Production. Ekon. APK. 2024, 26, 59–65. [Google Scholar] [CrossRef]
Akansha Arora. Top-10 Sugarcane Producing Countries in the World 2024. Available online: https://currentaffairs.adda247.com/top-10-sugarcane-producing-countries-in-the-world/ (accessed on 17 October 2025).
Lu, G.; Wang, Z.; Xu, F.; Pan, Y.-B.; Grisham, M.P.; Xu, L. Sugarcane Mosaic Disease: Characteristics, Identification and Control. Microorganisms 2021, 9, 1984. [Google Scholar] [CrossRef]
Huang, W.; Wang, S.; Ge, C.; Wei, L.; Du, D.; Niu, Z.; Li, M.; Zheng, Z. Structural Optimization and Performance Evaluation of a Sugarcane Leaf Mulching Machine. Smart Agric. Technol. 2025, 12, 101116. [Google Scholar] [CrossRef]
Bhuiyan, S.A.; Sherring, K.; Eglinton, J. Parasitic Nematodes of Sugarcane: A Major Productivity Impediment and Grand Challenges in Management. Plant Dis. 2024, 108, 2945–2957. [Google Scholar] [CrossRef]
Viswanathan, R. Impact of Yellow Leaf Disease in Sugarcane and Its Successful Disease Management to Sustain Crop Production. Indian Phytopathol. 2021, 74, 573–586. [Google Scholar] [CrossRef]
Sharma, R.; Rallapalli, S.; Magner, J. Optimizing Water-Efficient Agriculture: Evaluating the Sustainability of Soil Management and Irrigation Synergies Using Fuzzy Extent Analysis. Sci. Rep. 2025, 15, 29382. [Google Scholar] [CrossRef]
Sharma, P.; Sharma, A. A Novel Plant Disease Diagnosis Framework by Integrating Semi-Supervised and Ensemble Learning. J. Plant Dis. Prot. 2024, 131, 177–198. [Google Scholar] [CrossRef]
Pradhan, P.; Kumar, B.; Mohan, S. Comparison of Various Deep Convolutional Neural Network Models to Discriminate Apple Leaf Diseases Using Transfer Learning. J. Plant Dis. Prot. 2022, 129, 1461–1473. [Google Scholar] [CrossRef]
Kunduracioglu, I.; Pacal, I. Advancements in Deep Learning for Accurate Classification of Grape Leaves and Diagnosis of Grape Diseases. J. Plant Dis. Prot. 2024, 131, 1061–1080. [Google Scholar] [CrossRef]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 215232. [Google Scholar] [CrossRef]
Shoaib, M.; Shah, B.; EI-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. An Advanced Deep Learning Models-Based Plant Disease Detection: A Review of Recent Research. Front. Plant Sci. 2023, 14, 1158933. [Google Scholar] [CrossRef]
Abdullahi, H.S. Fast and Accurate Image Feature Detection for On-the-Go Field Monitoring Through Precision Agriculture: Computer Predictive Modelling for Farm Image Detection and Classification with Convolution Neural Network (CNN). Ph.D. Thesis, University of Bradford, Bradford, UK, 2020. [Google Scholar]
Kaushik, I.; Prakash, N.; Jain, A. Plant Disease Detection Using a Depth-Wise Separable-Based Adaptive Deep Neural Network. Multimed. Tools Appl. 2025, 84, 887–915. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. A Multimodal Framework for Pepper Diseases and Pests Detection. Sci. Rep. 2024, 14, 28973. [Google Scholar] [CrossRef]
Ayyad, S.M.; Sallam, N.M.; Gamel, S.A.; Ali, Z.H. Particle Swarm Optimization with YOLOv8 for Improved Detection Performance of Tomato Plants. J. Big Data. 2025, 12, 152. [Google Scholar] [CrossRef]
Hernández, S.; López, J.L. Uncertainty Quantification for Plant Disease Detection Using Bayesian Deep Learning. Appl. Soft Comput. 2020, 96, 106597. [Google Scholar] [CrossRef]
Li, W.Z.; Ma, S.C.; Wang, G.Y.; Huo, P.; Zhou, B.C.; Ma, J.Z.; Xie, Y.S.; Guo, C.; Wang, E.Z.; Yang, S. Detection of Sugarcane Stalk Node Based on Improved YOLOv8 and Its Deployment on Edge Device. Smart Agric. Technol. 2025, 12, 101385. [Google Scholar] [CrossRef]
Khan, A.T.; Jensen, S.M.; Khan, A.R.; Li, S. Plant Disease Detection Model for Edge Computing Devices. Front. Plant Sci. 2023, 14, 1308528. [Google Scholar] [CrossRef] [PubMed]
Alemán-Montes, B.; Serra, P.; Zabala, A.; Masó, J.; Pons, X. A near Real-Time Spatial Decision Support System for Improving Sugarcane Monitoring through a Satellite Mapping Web Browser. Smart Agric. Technol. 2025, 12, 101084. [Google Scholar] [CrossRef]
Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey. Artif. Intell. Rev. 2025, 58, 93. [Google Scholar] [CrossRef]
Limpamont, A.; Kittipanya-ngam, P.; Chindasombatcharoen, N.; Cavite, H.J.M. Towards Agri-food Industry Sustainability: Addressing Agricultural Technology Adoption Challenges through Innovation. Bus. Strategy Environ. 2024, 33, 7352–7367. [Google Scholar] [CrossRef]
Sánchez, E.; Calderón, R.; Herrera, F. Artificial Intelligence Adoption in SMEs: Survey Based on TOE–DOI Framework, Primary Methodology and Challenges. Appl. Sci. 2025, 15, 6465. [Google Scholar] [CrossRef]
Chahbouni, A.; El Manaa, K.; Abouch, Y.; El Manaa, I.; Bossoufi, B.; El Ghzaoui, M.; El Alami, R. Attention-Guided Differentiable Channel Pruning for Efficient Deep Networks. Mach. Learn. Knowl. Extr. 2025, 7, 110. [Google Scholar] [CrossRef]
Daphal, S.D.; Koli, S.M. Enhanced Deep Learning Technique for Sugarcane Leaf Disease Classification and Mobile Application Integration. Heliyon 2024, 10, e29438. [Google Scholar] [CrossRef] [PubMed]
Devi, B.S.; Chatrapati, K.S.; Sandhya, N. Enhanced Sugarcane Disease Detection Using DenseNet201 and DenseNet264 with Transfer Learning and Fine-Tuning. Front. Health Inform. 2024, 13, 687–713. [Google Scholar]
Nitin; Gupta, S.B.; Yadav, R.; Bovand, F.; Tyagi, P.K. Developing Precision Agriculture Using Data Augmentation Framework for Automatic Identification of Castor Insect Pests. Front. Plant Sci. 2023, 14, 1101943. [Google Scholar] [CrossRef] [PubMed]
Whata, A.; Dibeco, K.; Madzima, K.; Obagbuwa, I. Uncertainty Quantification in Multi-Class Image Classification Using Chest X-Ray Images of COVID-19 and Pneumonia. Front. Artif. Intell. 2024, 7, 1410841. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.J.; Chen, L.-C.; Tan, M.X.; Chu, G.; Vasudevan, V.; Zhu, Y.K.; Pang, R.M.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Hussain, A.; Barua, B.; Osman, A.; Abozariba, R.; Asyhari, A.T. Performance of MobileNetV3 Transfer Learning on Handheld Device-Based Real-Time Tree Species Identification. In Proceedings of the 26th International Conference on Automation and Computing (ICAC 2021), Portsmouth, UK, 15 November 2021; pp. 1–6. [Google Scholar]
Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6405–6416. [Google Scholar]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
Sensoy, M.; Kaplan, L.; Kandemir, M. Evidential Deep Learning to Quantify Classification Uncertainty. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 3183–3193. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 1050–1059. [Google Scholar]

Figure 1. Sample images of leaf disease: (a) Healthy; (b) Mosaic; (c) Red rot; (d) Rust; (e) Yellow.

Figure 2. Proposed architecture for the sugar disease identification model.

Figure 3. Model training dynamics showing the learning curves.

Figure 4. Confusion matrix of the proposed model: (a) raw counts and (b) normalized.

Figure 5. Radar chart.

Figure 6. Uncertainty Quantification analysis.

Figure 7. Uncertainty distribution.

Figure 8. ROC curves.

Figure 9. Accuracy comparison. Comparative analysis of proposed method against existing approaches. (a) Accuracy comparison: proposed method (97.23%) outperforms uncertainty-aware approach [18] (96.5%), multimodal framework [16] (94.2%), and sugarcane-specific method [26] (86.53%). (b) Feature comparison matrix showing proposed method uniquely combines uncertainty quantification, mobile optimization, global accessibility, and real-time inference. Values: 1.0 (full support), 0.5 (partial), 0.0 (none).

Figure 10. Grad CAM Analysis: (a) attention map, (b) disease-specific, and (c) single sample.

Figure 11. Uncertainty-attention correlation.

Figure 12. Web deployment screenshot.

Table 1. Comparative summary of previous studies on deep learning–based plant disease detection.

Study	Crop/Dataset	Method	Accuracy	Strengths	Limitations
Mohanty et al. [12]	plant Village (54,306 images, 38 classes)	CNN (AlexNet, GoogLeNet)	99.35%	Large-scale validation; multiple crop species	No uncertainty quantification; lab-controlled images only; high computational cost
Shoaib et al. [13]	Multiple crops (Review)	Various CNN architectures	Variable	Comprehensive survey of deep learning methods	Identifies lack of interpretability and confidence measures as key gaps
Kaushik et al. [15]	Potato	Depth-wise separable adaptive DNN	97.8%	Lightweight architecture; adaptive learning	Single crop focus; no confidence calibration; limited interpretability
Liu & Wang [16]	Pepper	Multimodal framework	94.2%	Combines visual and contextual features	Complex multi-input pipeline; no uncertainty estimation; resource demanding
Ayyad et al. [17]	Tomato	PSO + YOLOv8	96.8%	Optimized detection; real-time capability	Detection (not classification) focus; no uncertainty quantification; requires GPU
Hernández & López [18]	Plant Village	Bayesian Deep Learning	96.5%	Uncertainty quantification capability	Computationally expensive; complex implementation; not optimized for mobile deployment
Daphal & Koli [26]	Sugarcane (Maharashtra)	DenseNet	86.53%	Sugarcane-specific; mobile app integration	Lower accuracy; no uncertainty estimates; heavy architecture
Devi et al. [27]	Sugarcane	DenseNet201/264 with transfer learning	98.45%	High accuracy; fine-tuning approach	No uncertainty quantification; computationally intensive; black-box predictions
Proposed Method	Sugarcane (5 classes)	MC-Dropout-MobileNetV3	97.23%	Uncertainty quantification; lightweight (5.4 M params); interpretable (Grad-CAM); web-deployable; 2.3 s inference	Single-region dataset; no field validation yet

Table 2. Characteristics of the sugarcane leaf disease dataset used in this study.

Disease Class	Trainset	Validation Set	Test Set	Total Images	Class Distribution (%)	Image Quality
Healthy	339	78	89	506	17.6	High
Mosaic	301	69	87	457	17.2	Variable
Red Rot	336	78	116	530	23.0	High
Rust	334	77	103	514	20.4	High
Yellow	328	76	110	514	21.8	Variable
Total	1638	378	505	2521	100	Mixed

Table 3. Model training configuration and hyperparameters.

Component	Parameter	Value	Justification
Architecture	Base Model	MobileNetV3-Large	Lightweight (5.4 M params) for mobile deployment
	Classifier Dimensions	960→1280→640→5	Progressive dimensionality reduction
	MC Dropout Rates	0.2, 0.3	Balanced uncertainty-accuracy trade-off
Data Split	Training/Validation	80%/20%	Standard split for model evaluation
	Batch Size	32 (×2 accumulation)	Memory-efficient with effective size of 64
Optimization	Optimizer	AdamW	Superior weight decay regularization
	Initial Learning Rate	1 × 10⁻³	Standard for transfer learning
	Weight Decay	1 × 10⁻⁴	L2 regularization to prevent overfitting
	Label Smoothing	0.1	Improved generalization
Scheduling	LR Scheduler	Cosine Annealing LR	Smooth convergence
	T_max	100 epochs	Full cosine cycle
	Early Stopping	15 epochs	Prevent overfitting
Augmentation	Random Crop Scale	0.8–1.0	Preserve disease features
	Rotation Range	±30°	Natural variation
	Color Jitter	0.3 (B,C,S), 0.1 (H)	Lighting invariance
Uncertainty	MC Samples	10	Balance speed-reliability
	Confidence Thresholds	<0.4, 0.4–0.7, >0.7	Low/Medium/High uncertainty bins
Training	Max Epochs	100	Sufficient for convergence
	Gradient Clipping	1.0	Stability during training
	Workers	2	Optimal for Colab environment

Note: Dropout rates were selected through preliminary experiments on validation data (assessed: 0.1–0.5 in 0.1 increments). The asymmetric configuration (0.2, 0.3) achieved an optimal balance between classification accuracy and uncertainty-error correlation. Lower rates (<0.2) provided insufficient stochasticity for meaningful uncertainty differentiation, while higher rates (>0.4) degraded accuracy by approximately 1.5–2%. MC samples (T = 10) were selected following literature recommendations [1], achieving stable uncertainty estimates with practical inference time (2.3 s); higher values (T > 15) yielded marginal improvement (<0.1%) with proportionally increased computational cost.

Table 4. Summary of training metrics.

Metric	Initial (Epoch 1)	Peak Value	Final (Epoch 25)	Best Model
Training Accuracy	71.53%	99.11% (Epoch 23)	98.76%	-
Validation Standard Acc	85.35%	95.45% (Epoch 21, 24)	95.45%	Epoch 21
Validation MC Accuracy	86.53%	95.45% (Epoch 21)	95.25%	Epoch 21
Training Loss	0.9438	0.4159 (Epoch 23)	0.4173	-
Validation Loss	0.7680	0.4870 (Epoch 25)	0.4870	0.4910 (Epoch 21)
Learning Rate	1.000 × 10⁻³	1.000 × 10⁻³ (Epoch 1)	4.000 × 10⁻⁶	9.500 × 10⁻⁵ (Epoch 21)
Overfitting Gap	13.82%	3.66% (Epoch 23)	3.51%	3.00% (Epoch 21)

Note: The best model was selected at epoch 21 based on the highest validation MC accuracy (95.45%). The training continued through Epoch 25 without stopping early.

Table 5. Comprehensive Model Validation and Performance Analysis.

Validation Method	Metric	Value	95% CI	SE	Status
Cross-Validation (5-fold)	Mean Accuracy	99.13%	[98.80%, 99.45%]	0.12%	Excellent
	Standard Deviation	0.26%	-	-	Very Stable
	Coefficient of Variation	0.002	-	-	Minimal Variance
Split Robustness (5 seeds)	Mean Accuracy	99.25%	[98.84%, 99.65%]	0.15%	Highly Robust
Split Robustness (5 seeds)	Standard Deviation	0.33%	-	-	Consistent
Bootstrap Validation	Cross-Validation (n = 1000)	99.13%	[98.93%, 99.32%]	0.10%	Validated
Bootstrap Validation	Split Robustness (n = 1000)	99.25%	[98.97%, 99.49%]	0.13%	Confirmed
Final Test Performance	MC Accuracy	97.23%	-	-	High Performance
Final Test Performance	Standard Accuracy	96.83%	-	-	Strong Baseline

Table 6. Disease-Specific Classification Performance.

Disease Class	Precision	Recall	F1-Score	Support
Healthy	0.98	1.00	0.99	103
Mosaic	0.98	0.95	0.97	105
Red Rot	0.98	1.00	0.99	100
Rust	0.98	0.95	0.97	105
Yellow	0.94	0.96	0.95	92
Overall	0.97	0.97	0.97	505

Table 7. Uncertainty Quantification and Correlation Analysis.

Uncertainty Metric	Value	95% CI	Statistical Test	p-Value	Interpretation
Prediction Uncertainty
Mean (Correct Predictions)	0.0008	-	-	-	Very Low Uncertainty
Mean (Incorrect Predictions)	0.0043	-	-	-	Higher Uncertainty
Uncertainty Separation	5.38 × higher	-	-	-	Good Discrimination
Confidence Analysis
Mean Confidence (Correct)	0.8199	-	-	-	High Confidence
Mean Confidence (Incorrect)	0.4879	-	-	-	Lower Confidence
Confidence Gap	0.332	-	-	-	Clear Separation
Correlation Testing
Uncertainty-Error Correlation	r = 0.365	[0.287, 0.439]	t = 8.801 (df = 503)	<0.001 ***	Medium Effect Size
Sample Size	n = 505	-	-	-	Adequate Power

Note: *** p < 0.001 (two-tailed test). Stratified by confidence bins: low confidence (<0.4), accuracy = 76.5% (n ≈ 75); medium confidence (0.4–0.7), accuracy = 94.1% (n ≈ 150); high confidence (>0.7), accuracy = 98.2% (n ≈ 280).

Table 8. User-oriented interpretation of prediction confidence levels.

Confidence Level	Probability Threshold	Recommended User Action	Observed Accuracy (%)
High	>70%	Proceed with the recommended treatment	98.2
Medium	40–70%	Exercise caution; expert consultation advised	94.1
Low	<40%	Do not act without verification; seek expert assessment	76.5

Table 9. Statistical Significance Testing and Effect Size Analysis.

Comparison Test	Test Statistic	p-Value	Effect Size	95% CI	Interpretation
One-Sample t-Tests
Model vs. Random Baseline	t = 672.4 (df = 4)	<0.001 ***	Cohen’s d = 300.7	-	Extremely Large Effect
Model vs. Majority Class	t = 648.9 (df = 4)	<0.001 ***	Cohen’s d = 290.2	-	Extremely Large Effect
Paired Comparison
MC vs. Standard	W = 0.0	0.062	Δ = 0.39%	-	Marginal Improvement
McNemar’s Tests
Model vs. Random	χ² = 388.0 (df = 1)	<0.001 ***	-	-	Statistically Significant (p < 0.001)
Model vs. Majority Class	χ² = 375.0 (df = 1)	<0.001 ***	-	-	Statistically Significant (p < 0.001)

Note: *** p < 0.001. Effect sizes: d = Cohen’s d, Δ = mean difference.

Table 10. Ablation Study and Baseline Comparisons.

Model Configuration	Accuracy (%)	Improvement over Random (%)	Improvement over Majority (%)	Key Features
Full Model (Ours)	97.23	+77.23	+74.46	MC dropout + Uncertainty
Without MC Sampling	96.83	+76.83	+74.06	Standard Inference
Random Baseline	20.00	-	−2.77	Theoretical Lower Bound
Majority Class	22.77	+2.77	-	Always Predict “Healthy”
Final Test Performance	97.23	+77.23	+74.46	Real-world Validation

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pugazhendi, P.; Badgujar, C.M.; Ganapathy, M.R.; Arumugam, M. Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3. AgriEngineering 2026, 8, 31. https://doi.org/10.3390/agriengineering8010031

AMA Style

Pugazhendi P, Badgujar CM, Ganapathy MR, Arumugam M. Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3. AgriEngineering. 2026; 8(1):31. https://doi.org/10.3390/agriengineering8010031

Chicago/Turabian Style

Pugazhendi, Pathmanaban, Chetan M. Badgujar, Madasamy Raja Ganapathy, and Manikandan Arumugam. 2026. "Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3" AgriEngineering 8, no. 1: 31. https://doi.org/10.3390/agriengineering8010031

APA Style

Pugazhendi, P., Badgujar, C. M., Ganapathy, M. R., & Arumugam, M. (2026). Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3. AgriEngineering, 8(1), 31. https://doi.org/10.3390/agriengineering8010031

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Uncertainty-Aware Deep Learning for Sugarcane Leaf Disease Detection Using Monte Carlo Dropout and MobileNetV3

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Description

2.2. Data Preprocessing and Augmentations

2.3. Model Architecture Development

2.4. Model Training Strategy

2.5. Uncertainty Quantification Framework

2.6. Grad-CAM Interpretability

2.7. Validation Framework

2.8. Performance Analysis

2.9. Implementation Details

3. Results

3.1. Training Dynamics and Model Convergence

3.2. Overall Classification Performance

3.3. Disease-Specific Performance Analysis

3.4. Uncertainty Analysis

3.5. Statistical Validation and Significance Testing

3.6. Model Comparisons and Ablation Study

3.7. Model Interpretability

3.8. Web Platform Deployment

4. Discussion

4.1. Performance Analysis and Benchmarking

4.2. Uncertainty Quantification Insights

4.3. Architectural Considerations

4.4. Interpretability Analysis

4.5. Study Limitations

4.6. Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI