Deep Learning-Driven Plant Pathology Assistant: Enabling Visual Diagnosis with AI-Powered Focus and Remediation Recommendations for Precision Agriculture
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsComments:
Overall Evaluation:
This paper presents a well-structured and technically solid study on a Synergistic Dual-Augmentation and Class-Aware Hybrid (SDA-CAH) model for plant disease recognition, achieving impressive results on the PlantVillage dataset and demonstrating high interpretability through Grad-CAM visualization and a web-based implementation. The paper is clearly written and methodologically sound, with meaningful contributions to precision agriculture. However, several aspects require minor revisions before publication.
Strengths:
-
The paper addresses three critical challenges—class imbalance, heterogeneous image quality, and interpretability—in a unified framework.
-
The proposed dual-augmentation and class-aware sampling strategies are clearly justified and well-integrated.
-
The addition of Grad-CAM visualizations and a practical web deployment significantly enhance the work’s application value.
-
The writing is generally fluent and logically organized, following the typical structure of AgriEngineering papers.
Weaknesses and Minor Revisions:
-
Overemphasis on near-perfect accuracy: While 99.95% accuracy is impressive, such results on the PlantVillage dataset may raise concerns about potential overfitting or data leakage. The authors should clarify how data leakage was avoided and whether cross-dataset or field data validation was conducted.
-
Limited generalization discussion: The discussion should better address real-world variability beyond PlantVillage, for example, how SDA-CAH might perform on field images with noise, occlusion, or new disease types.
-
Comparison fairness: Please provide additional details on the hyperparameter settings for the baseline models (Xception, EfficientNet-B0) to ensure fair comparison.
-
Figure quality: Some figures (especially confusion matrices and Grad-CAM visualizations) could be improved for readability. Please increase resolution and add clear color legends.
-
Language polishing: Minor grammatical and stylistic improvements are suggested, especially in the Abstract and Discussion sections (e.g., "the SDA-CAH not only surpasses..." → "SDA-CAH not only surpasses...").
-
Practical implications: The section on system implementation is valuable but could briefly include computational efficiency data (e.g., inference time or hardware resource consumption) to demonstrate practical feasibility.
Recommendation:
The paper is of high quality and relevance to the AgriEngineering readership and will likely attract attention within precision agriculture and applied AI communities once the minor issues are addressed.
Author Response
Dear Reviewer,
We would like to express our heartfelt appreciation for your thoughtful and detailed review of our manuscript. Your insightful comments and constructive suggestions have been invaluable in helping us enhance the scientific rigor, clarity, and overall quality of this work. We are truly grateful for the time and effort you dedicated to providing such comprehensive feedback. In response, we have carefully revised the manuscript and addressed each point with great attention and sincerity. Below, we present our detailed, point-by-point responses to all your comments.
Comments 1: Overemphasis on near-perfect accuracy: While 99.95% accuracy is impressive, such results on the PlantVillage dataset may raise concerns about potential overfitting or data leakage. The authors should clarify how data leakage was avoided and whether cross-dataset or field data validation was conducted.
Response 1: We sincerely thank you for raising this important point regarding the reliability and interpretability of our reported performance metrics. We fully agree that near-perfect accuracy on a benchmark dataset requires careful scrutiny to ensure that the results genuinely reflect model capability rather than potential data leakage or overfitting.
To address this, we have substantially revised Sections 3.2.7 (“Training and Evaluation”,Paragraphs 2 and 3 of section 3.2.7 on page 15), 4.1 (“Quantitative Results”,Paragraphs 2 and 3 of Chapter 4.1 on pages 19 and 20.), and 5.2 (“Limitations and Future Work”,Paragraph 1 of Chapter 5.2 on page 31) to provide a comprehensive clarification and transparent discussion, as summarized below:
1.Prevention of Data Leakage:We implemented a rigorous stratified sampling strategy to partition the PlantVillage dataset (38 classes, >50,000 images) into 70% training, 15% validation, and 15% test sets using a fixed random seed of 42. The split was conducted at the image level, ensuring that no image or its augmented variant appeared across different subsets. This process prevents data duplication and guarantees that the evaluation results reflect genuine model generalization rather than memorization. The detailed dataset split is presented in Table 3, and the description has been expanded to emphasize reproducibility and fairness across all classes.
2.Clarification of Accuracy Scope and Real-World Relevance:In Section 4.1, we explicitly acknowledge that the PlantVillage dataset represents a controlled, laboratory environment with homogeneous backgrounds and illumination. Therefore, the reported 99.95% accuracy indicates strong in-domain performance rather than cross-domain robustness. We have clarified that no cross-dataset or in-field validation was conducted in the present study, and the results should be interpreted accordingly. The discussion now explicitly distinguishes between controlled benchmarking and real-world generalization.
3.Future Cross-Domain Evaluation Plans:To strengthen the scientific transparency of the work, we have included in Section 5.2 (“Limitations and Future Work”) a detailed statement outlining our plans to validate the SDA-CAH framework on heterogeneous, field-acquired datasets (e.g., real-time agricultural image streams) to rigorously assess cross-domain generalization. We also highlighted the need for lightweight deployment and multimodal integration as future research directions to further enhance real-world applicability.
4.Balanced Interpretation of Results:The revised text now explicitly advises that while SDA-CAH sets a new benchmark on PlantVillage, such near-perfect performance “must be interpreted with caution.” We emphasize that the results primarily demonstrate the framework’s internal consistency, interpretability, and class-balancing capability within a controlled benchmark, rather than suggesting universal performance across all agricultural scenarios.
Collectively, these revisions clarify our experimental protocol, ensure methodological transparency, and provide a balanced interpretation of the model’s performance within its experimental scope. We are grateful for your insightful comment, which prompted us to improve the rigor and realism of our discussion.
Comments 2: Limited generalization discussion: The discussion should better address real-world variability beyond PlantVillage, for example, how SDA-CAH might perform on field images with noise, occlusion, or new disease types.
Response 2: We sincerely thank you for this insightful and constructive suggestion. We fully agree that emphasizing generalization beyond the controlled PlantVillage dataset is essential to establish the real-world applicability and robustness of the proposed SDA-CAH framework. In response, we have substantially revised Section 4.1 (“Quantitative Results”,Paragraphs 2, 3, and 4 of Chapter 4.1 on pages 19, 20, 21, and 22.) and expanded the discussion to provide a more comprehensive and balanced analysis of real-world variability, potential limitations, and the model’s expected behavior under field conditions. The key improvements are summarized as follows:
1.Expanded Discussion on Real-World Variability:We explicitly recognize that the PlantVillage dataset comprises laboratory-captured images with uniform backgrounds, consistent illumination, and clearly visible lesions, which inherently simplify recognition tasks. The revised text now discusses how such controlled conditions may overestimate the model’s real-world performance. We emphasize that field images often contain background clutter, partial occlusion, lighting variations, and unseen disease types, all of which can challenge recognition robustness. This acknowledgment adds transparency and aligns the discussion with your recommendation for greater ecological and practical realism.
2.Analysis of SDA-CAH’s Potential under Field Conditions:We have added a detailed discussion explaining how the synergistic dual-augmentation (SDA) and class-aware hybrid (CAH) sampling mechanisms contribute to model resilience under noisy and heterogeneous visual conditions. Specifically, we point out that these modules enhance feature diversity, intra-class discriminability, and adaptability to unseen disease variants and varying image qualities. While no cross-dataset testing was performed in this study, this architectural design provides a solid foundation for cross-domain transferability, which we intend to validate in subsequent work.
3.Clear Statement on Future Cross-Dataset Validation:In the revised Section 4.1, we explicitly state that future evaluations will focus on field-collected datasets and cross-domain benchmarks to rigorously assess SDA-CAH’s robustness to noise, occlusion, illumination variance, and novel disease phenotypes. This statement demonstrates our commitment to validating generalization beyond the current dataset and provides a clear research trajectory toward practical deployment.
4.Enhanced Interpretability and Empirical Evidence:To complement this expanded discussion, we have added confusion matrix analyses (Figures 7–10) across different models to illustrate SDA-CAH’s superior intra-class discrimination and its ability to maintain performance even among visually similar disease categories. This empirical evidence reinforces the claim that SDA-CAH achieves balanced and discriminative learning—a property expected to translate positively to real-world conditions.
Through these revisions, we have strengthened the manuscript’s scientific transparency, ecological validity, and forward-looking scope. The revised discussion now provides a balanced interpretation of in-domain performance while explicitly addressing potential variability in field conditions and outlining a concrete roadmap for future cross-domain validation.
We sincerely appreciate your valuable comment, which prompted us to significantly enhance the depth and realism of our generalization analysis. We hope the expanded discussion now satisfactorily addresses this concern and further demonstrates the practical relevance of the SDA-CAH framework for real-world agricultural applications.
Comments 3: Comparison fairness: Please provide additional details on the hyperparameter settings for the baseline models (Xception, EfficientNet-B0) to ensure fair comparison.
Response 3: We sincerely thank you for this insightful and constructive suggestion. We fully agree that ensuring fair and transparent comparison across baseline models is critical for demonstrating the validity of our proposed SDA-CAH framework. In response, we have substantially expanded Section 3.3.1 (Implementation Details,Paragraph 3 of section 3.3.1 on page 17.) and Section 3.3.2 (Comparative Experiment,All paragraphs of section 3.3.2 on page 18.) to provide a comprehensive description of all hyperparameter configurations and training protocols used for both SDA-CAH and the baseline models (Xception, EfficientNet-B0, and MobileNetV2).
Specifically, we now clarify that identical hyperparameter settings were employed for both SDA-CAH + EfficientNet-B0 and the standard EfficientNet-B0 baseline to ensure a fair and unbiased comparison. The models were trained with a batch size of 64, an initial learning rate of 0.0005 (using the AdamW optimizer with weight decay of 1×10⁻³), gradient clipping (max_norm = 1.0), and a ReduceLROnPlateau scheduler (decay factor = 0.4, patience = 2). Each model was trained for 20 epochs with early stopping (patience = 5) to prevent overfitting. Additionally, we explicitly report the MixUp coefficient (α = 0.1), which dynamically adjusts the sample-mixing ratio to enhance feature discrimination. These configurations are now systematically summarized in Table 5 for clarity and reproducibility.
Furthermore, detailed architectural configurations and hyperparameter settings for Xception (Table 6) and MobileNetV2 (Table 7) have also been included to ensure methodological transparency and experimental rigor. This revision guarantees that all comparative results are derived under consistent optimization conditions, emphasizing that the observed performance improvements of SDA-CAH arise from the proposed dual-augmentation and class-aware hybrid sampling strategies rather than unequal hyperparameter tuning.
We have also reinforced this discussion in Section 4.1 (Quantitative Results,Paragraphs 2, 3, and 4 of Chapter 4.1 on pages 19, 20, 21, and 22.), highlighting that the SDA-CAH framework—built upon the same EfficientNet-B0 backbone and identical training parameters—achieved a clear performance gain (accuracy: 99.95% vs. 99.35%, weighted F1-score: 99.89% vs. 99.32%), confirming the genuine effectiveness of the proposed mechanisms.
Together, these revisions ensure complete fairness, reproducibility, and transparency in model evaluation, thereby strengthening the credibility of our experimental comparisons. We sincerely appreciate your valuable recommendation, which has helped us significantly enhance the methodological integrity and scientific rigor of our work.
Comments 4: Figure quality: Some figures (especially confusion matrices and Grad-CAM visualizations) could be improved for readability. Please increase resolution and add clear color legends.
Response 4: We sincerely thank you for this valuable suggestion. We completely agree that clear, high-resolution visualizations are essential for improving interpretability and readability. In response, we have significantly enhanced Figures 5(Page 14), 7(Page 22), 8(Page 23), 9(Page 24), 10(Page 25), and 11(Page 26) throughout the revised manuscript.
Specifically, all confusion matrices (Figures 7–10) have been re-rendered at higher resolution (600 dpi) with refined color gradients to clearly distinguish between true-positive, false-positive, and false-negative regions. Each matrix now includes a precise color legend and numerical scale bar, facilitating easier quantitative interpretation of classification performance across all disease categories. Axis labels and tick marks have also been enlarged and standardized for visual consistency.
Similarly, the Grad-CAM visualizations (Figures 11) have been re-generated using higher-quality rendering with enhanced contrast and transparency adjustments to better highlight the lesion localization regions. A color bar legend has been added to each figure to indicate the relative activation intensity, thereby improving interpretability and reproducibility.
These graphical refinements substantially improve visual clarity and presentation quality without altering any underlying results. We truly appreciate your insightful comment, which helped us enhance the visual rigor and readability of the manuscript.
Comments 5: Language polishing: Minor grammatical and stylistic improvements are suggested, especially in the Abstract and Discussion sections (e.g., "the SDA-CAH not only surpasses..." → "SDA-CAH not only surpasses...").
Response 5: We sincerely thank you for this thoughtful and constructive comment. We fully agree that precise grammar and stylistic consistency are vital for enhancing readability and professionalism. In response, we have thoroughly polished the entire manuscript, with particular attention to the Abstract and Section 4 (Results and Discussion,Pages 19-30), as suggested.
All grammatical inconsistencies, redundant articles, and stylistic irregularities (including the use of definite articles such as “the SDA-CAH”) have been carefully reviewed and corrected. We also refined sentence structures to improve clarity, academic tone, and logical flow throughout the text. The revised version now adheres more closely to standard scientific writing conventions, ensuring smoother readability and greater linguistic precision.
These revisions have notably improved the overall fluency and coherence of the manuscript while preserving its technical accuracy and scientific depth. We are deeply grateful to you for this valuable suggestion, which has helped us elevate the manuscript’s language quality and presentation to a more professional and polished standard.
Comments 6: Practical implications: The section on system implementation is valuable but could briefly include computational efficiency data (e.g., inference time or hardware resource consumption) to demonstrate practical feasibility.
Response 6: We sincerely thank you for this insightful suggestion. We fully agree that including computational efficiency metrics is essential to demonstrate the system’s real-world feasibility and deployment readiness.
In response, we have revised Section 3.2.6 (System Implementation,Paragraph 1 of section 3.2.6 on page 13.) to incorporate specific quantitative results on computational performance. In the updated version, we now report that the proposed plant disease diagnosis platform—based on the SDA-CAH framework—achieves an average inference time of 0.42 seconds per image on an NVIDIA GeForce GTX 1650 GPU. This new information highlights the system’s rapid response capability and lightweight computational footprint, confirming that it can efficiently support real-time disease detection and analysis in practical agricultural environments.
These additions strengthen the section by providing concrete evidence of the model’s operational efficiency and scalability. We are deeply grateful to you for this constructive comment, which has helped us clearly demonstrate the practical feasibility and deployability of the proposed platform.
We sincerely thank you for the thoughtful feedback and kind recognition of our work's contributions. Your valuable insights have greatly strengthened the manuscript in terms of scientific depth, clarity, and practical significance. We truly appreciate the time and effort you invested in reviewing our study. Should there be any remaining issues or areas requiring further refinement, we would be more than willing to make additional revisions based on your guidance.
With kind regards,
Ran Wang
Shandong University of Technology
Zibo, Shandong, China
Email: 24505040727@stumail.sdut.edu.cn
Reviewer 2 Report
Comments and Suggestions for AuthorsThe research follows the hot topics in AI studies and demonstrates a certain degree of innovation. It is recommended to accept the paper after revisions.
- The introduction section is too lengthy. The introduction and research progress sections exceed 6 pages. Please reduce them by half.
- In Section 3.3.1 of the paper, Table 5 shows that the model was trained for 20 epochs. However, in other parts of the paper, it is mentioned that the model was trained for 10 epochs. So, was the final model trained for 10 or 20 epochs?
- Plant Village is a laboratory dataset, but the paper's title and abstract repeatedly emphasize applications in "precision agriculture" and the "real-world," which represents a significant disconnect.
- Merely comparing the model with Xception and the standard EfficientNet-B0 is far from adequate.
- Regarding the curve in Figure 6, in terms of the number of training epochs and accuracy, has the model truly not experienced overfitting?
- The visual framework of the paper is not clearly explained, with only a few flowcharts provided.
- It needs to be clarified whether the dataset used in the paper is a self-built dataset or a publicly available dataset.
Author Response
Dear Reviewer,
We would like to express our heartfelt appreciation for your thoughtful and detailed review of our manuscript. Your insightful comments and constructive suggestions have been invaluable in helping us enhance the scientific rigor, clarity, and overall quality of this work. We are truly grateful for the time and effort you dedicated to providing such comprehensive feedback. In response, we have carefully revised the manuscript and addressed each point with great attention and sincerity. Below, we present our detailed, point-by-point responses to all your comments.
Comments 1: The introduction section is too lengthy. The introduction and research progress sections exceed 6 pages. Please reduce them by half.
Response 1: We sincerely appreciate your valuable suggestion regarding the excessive length of the Introduction(Pages 2-3) and Related Work(Pages 2-5) sections. Following this advice, we carefully revised and condensed both sections to improve focus and readability. Specifically, we streamlined background descriptions, merged overlapping content, and emphasized the most relevant literature and research context. As a result, the Introduction and Related Work sections were reduced by approximately 50%, without compromising the technical depth or logical flow of the paper. These revisions have made the manuscript more concise, balanced, and easier to follow. We are deeply grateful for this insightful comment, which has greatly enhanced the overall clarity and presentation of our work.
Comments 2: In Section 3.3.1 of the paper, Table 5 shows that the model was trained for 20 epochs. However, in other parts of the paper, it is mentioned that the model was trained for 10 epochs. So, was the final model trained for 10 or 20 epochs?
Response 2: We sincerely thank you for carefully identifying the inconsistency regarding the number of training epochs. We have thoroughly reviewed the manuscript and clarified this point in Section 3.3.1 (Implementation Details,Pages 16). All experiments in this study were consistently trained for 20 epochs, which aligns with the settings reported in Table 5. The previous mention of 10 epochs was due to an inadvertent oversight during manuscript drafting. This inconsistency has now been corrected throughout the paper to ensure full coherence and accuracy. We greatly appreciate your attention to this detail, which helped us further improve the clarity and precision of our work.
Comments 3: Plant Village is a laboratory dataset, but the paper's title and abstract repeatedly emphasize applications in "precision agriculture" and the "real-world," which represents a significant disconnect.
Response 3: We sincerely thank you for this insightful and constructive comment regarding the potential disconnect between the PlantVillage dataset and the emphasis on “precision agriculture” and “real-world” applications. In response, we have carefully revised both the Abstract and the Title to ensure conceptual and contextual consistency. Specifically, the revised Abstract now explicitly clarifies that our experimental evaluations are conducted on the PlantVillage dataset while emphasizing that the proposed SDA-CAH framework and its accompanying diagnostic system are designed with real-world deployment and scalability in mind. The updated title — “Deep Learning-Driven Plant Pathology Assistant: Enabling Visual Diagnosis with AI-Powered Focus and Remediation Recommendations for Precision Agriculture” — now better reflects this alignment by highlighting the system’s practical utility rather than overstating field deployment. These revisions more accurately position our work as a bridge between laboratory research and its translation into precision agricultural applications. We deeply appreciate your valuable feedback, which has helped us present the scope and significance of our study with greater precision and clarity.
Comments 4: Merely comparing the model with Xception and the standard EfficientNet-B0 is far from adequate.
Response 4: We sincerely thank you for highlighting the need for a more comprehensive comparison beyond Xception and standard EfficientNet-B0. In response, we have substantially expanded our comparative experiments to include three representative state-of-the-art convolutional neural networks: Xception, EfficientNet-B0, and MobileNetV2. All models utilized ImageNet pre-trained weights, ensuring fair evaluation of feature transferability and highlighting the distinct contribution of the proposed SDA-CAH framework.
The revised design provides several key improvements:
Expanded Baseline Selection:
Xception, built on depthwise separable convolutions, excels in capturing fine-grained textural variations and subtle lesion morphology, making it highly effective for differentiating visually similar disease categories.
EfficientNet-B0, known for its compound scaling strategy, offers a strong balance between parameter efficiency and representational capacity but lacks specific mechanisms for handling inter-class imbalance.
MobileNetV2, a lightweight architecture, employs inverted residuals and linear bottlenecks to achieve a favorable trade-off between computational efficiency and accuracy, representing a practical benchmark for resource-constrained deployment scenarios.
Rigorous Hyperparameter Strategy:To ensure each model achieves its best achievable performance, we employed empirically optimized hyperparameters individually for each baseline, including learning rate, optimizer, batch size, and regularization settings, rather than enforcing identical configurations. This approach ensures a fair yet performance-maximized comparison, avoiding artificially constrained results and providing a scientifically rigorous assessment of SDA-CAH’s advantages.
Comprehensive Quantitative Evaluation:On the PlantVillage test set, SDA-CAH + EfficientNet-B0 achieved 99.95% accuracy, 99.89% F1-score, and 99.89% recall, outperforming Xception (99.42%, 99.39%, 99.41%), EfficientNet-B0 (99.35%, 99.32%, 99.33%), and MobileNetV2 (95.77%, 94.52%, 94.77%). These results demonstrate unequivocal superiority, with SDA-CAH delivering substantial gains across all metrics, particularly in rare disease classes (<2% of the data), highlighting the effectiveness of the synergistic dual-augmentation and class-aware hybrid sampling mechanisms.
Extended Analysis:We provide detailed confusion matrices for all baselines and SDA-CAH (Figures 7–10), showing near-perfect classification along the main diagonal and minimal misclassification, even for visually similar categories.
We also included computational efficiency metrics (inference speed and training time) in Tables 8 and 9, demonstrating that SDA-CAH achieves superior predictive performance without compromising efficiency or scalability, making it suitable for deployment in precision agriculture systems.
Collectively, these revisions not only address your concern about insufficient baseline comparison but also provide a more realistic and scientifically rigorous evaluation of SDA-CAH’s performance. By incorporating multiple architectures, optimized hyperparameters, and extended performance metrics, we firmly establish the SDA-CAH framework as a state-of-the-art, interpretable, and practically deployable solution for plant disease recognition.
We greatly appreciate your suggestion, which has strengthened the rigor, comprehensiveness, and practical relevance of our comparative analysis.
Comments 5: Regarding the curve in Figure 6, in terms of the number of training epochs and accuracy, has the model truly not experienced overfitting?
Response 5: We sincerely thank you for the insightful comment regarding potential overfitting as observed in Figure 6. To clarify, the y-axis in Figure 6 is scaled with a fine interval of 0.02, which provides a highly detailed view of the training and validation accuracy curves. This fine-grained scaling may give the visual impression of minor fluctuations, but in fact, the curves demonstrate stable and nearly identical trends for both training and validation sets throughout the 20 training epochs.
Moreover, an early-stopping mechanism with a patience of 5 epochs was employed during training to prevent overfitting. The combination of this regularization strategy, the synergistic dual-augmentation, and the class-aware hybrid sampling ensures that the model consistently learns robust and discriminative features without memorizing the training data. The small deviations observable in the figure are within normal stochastic variations inherent to mini-batch gradient descent and do not indicate actual overfitting.
In summary, Figure 6 reflects the rapid convergence and stable generalization of the SDA-CAH framework. We believe that the high fidelity of training and validation curves, coupled with the early-stopping mechanism, convincingly demonstrates that the model maintains robust performance throughout the entire training process.
We appreciate your attention to detail, which has given us the opportunity to clarify this important aspect of model behavior.
Comments 6: The visual framework of the paper is not clearly explained, with only a few flowcharts provided.
Response 6: We sincerely thank you for the valuable suggestion regarding the clarity of the visual framework. To address this concern, we have substantially enhanced the explanations in Sections 3.2.4 (Model Architecture,Pages 11-12) and 3.2.6 (System Implementation,Pages 13-15), and expanded the corresponding figures (Figures 4 and 5) to provide a comprehensive and interpretable overview of the SDA-CAH framework and its practical deployment.
Specifically, the SDA-CAH architecture is now described as a modular, end-to-end visual framework, integrating five core stages: input preprocessing, dual augmentation, class-aware hybrid sampling, deep feature extraction, and stability-oriented optimization (Figure 4). Each stage is carefully elaborated:
The Synergistic Dual Augmentation (SDA) module combines geometric and photometric transformations to enhance intra-class diversity while preserving lesion morphology.
The Class-Aware Hybrid (CAH) sampling module ensures balanced representation of both minority and majority classes, mitigating class imbalance.
The EfficientNet-B0 backbone is adapted for 38-class PlantVillage classification, retaining robust low-level visual priors through ImageNet pretraining.
Optimization strategies, including label-smoothing cross-entropy, AdamW optimizer, gradient clipping, and dynamic learning rate scheduling, are incorporated to ensure stable convergence and generalization.
In addition, we have elaborated the system-level implementation (Figure 5), detailing the deployment of SDA-CAH in a web-based plant disease diagnosis platform. This description covers:
Client–server architecture with responsive frontend and GPU-accelerated backend for efficient, low-latency inference.
End-to-end workflow from image upload/capture, preprocessing, inference, Grad-CAM visualization, to actionable remediation recommendations.
Containerized deployment using Docker and NGINX for scalable, reproducible, and secure operation across diverse environments.
Through these updates, the visual framework now clearly illustrates the coordinated interactions between algorithmic modules, feature extraction, augmentation, and optimization strategies, as well as their integration into a practical, user-centric platform. These modifications provide a transparent, interpretable, and reproducible overview, highlighting both the methodological rigor and the practical feasibility of our approach for real-world precision agriculture.
We believe that these enhancements significantly improve the clarity and accessibility of the framework, and we sincerely appreciate your insightful suggestion, which has helped us better communicate the structure and functionality of SDA-CAH.
Comments 7: It needs to be clarified whether the dataset used in the paper is a self-built dataset or a publicly available dataset.
Response 7: We sincerely thank you for highlighting the need to clarify the dataset source. To address this, we have updated Section 3.1 (Dataset,Page 19-30, paragraph 1 of section 3.1.) to explicitly state that our study employs the publicly available PlantVillage dataset, rather than a self-built dataset.
The PlantVillage dataset comprises 54,305 color images across 38 disease categories spanning 14 crop species (e.g., tomato, maize, grape) and multiple disease types (e.g., leaf spot, rust). Its large scale and multi-class nature provide an ideal benchmark for evaluating model robustness, generalization, and interpretability. To better illustrate its characteristics, we present representative sample images (Figure 1), the class distribution (Figure 2), and detailed statistical counts (Table 1). Analysis of the dataset reveals pronounced class imbalance, with several rare categories containing fewer than 2% of the total images, highlighting the necessity and novelty of the proposed class-aware hybrid sampling mechanism.
By employing this widely recognized, publicly available dataset, our study ensures reproducibility and comparability with other plant disease recognition research. Additionally, the dataset’s diversity and openness allow the SDA-CAH framework to fully leverage its intrinsic properties, demonstrating robust performance and practical potential for precision agriculture.
We believe this clarification clearly addresses your concern and further emphasizes the rigor, reproducibility, and real-world relevance of our work.
We sincerely thank you for the thoughtful feedback and kind recognition of our work’s contributions. Your valuable insights have greatly strengthened the manuscript in terms of scientific depth, clarity, and practical significance. We truly appreciate the time and effort you invested in reviewing our study. Should there be any remaining issues or areas requiring further refinement, we would be more than willing to make additional revisions based on your guidance.
With kind regards,
Ran Wang
Shandong University of Technology
Zibo, Shandong, China
Email: 24505040727@stumail.sdut.edu.cn
Reviewer 3 Report
Comments and Suggestions for AuthorsOverall, I would like to report that the work left a positive impression, and many of the comments can be taken as recommendations for future work.
The main issue addressed in this article is improving the efficiency of precision farming by identifying plant pathologies. The tools for achieving this are visual inspection followed by processing by deep learning neural networks.
Ensuring food security in the face of population growth and general urbanization is a major challenge. Early, accurate identification of plant diseases can significantly reduce crop losses in both production volume and quality. Therefore, I believe the article addresses a relevant topic, and the title and content are consistent with the journal's scope.
The authors present an approach to improving the existing neural network architecture to improve performance.
The proposed solution allows for a wide range of crops and diseases to be studied.
The authors reviewed a sufficient data set, presented research methodology, and outlined ways to improve the structure of existing networks. The authors' efforts to normalize the data during model development and validation are particularly noteworthy.
A number of comments regarding the material are aimed at improving its readability:
1. The authors' review of existing neural networks used for plant disease identification is somewhat superficial. They do not compare existing models with each other based on measurable criteria, nor do they compare them with the requirements for performing plant disease identification procedures. The existing models reviewed should have been presented in tabular form with a comparison of key indicators.
2. The models reviewed are of comparable quality, making it unclear what the advantage of the new model will be. Furthermore, the conclusions do not sufficiently clearly convey the advantages of the proposed solution over existing ones. They fail to highlight not only the advantages of the machine model, but also the technology's economic and energy efficiency.
3. Figures 5, 7-9 require revision. In their current form, they are practically illegible.
4. Attention should have been paid not only to the software but also to the technical implementation.
5. The conclusions should include comparisons and advantages over existing solutions.
Overall, the article made a positive impression.
Author Response
Dear Reviewer,
We would like to express our heartfelt appreciation for your thoughtful and detailed review of our manuscript. Your insightful comments and constructive suggestions have been invaluable in helping us enhance the scientific rigor, clarity, and overall quality of this work. We are truly grateful for the time and effort you dedicated to providing such comprehensive feedback. In response, we have carefully revised the manuscript and addressed each point with great attention and sincerity. Below, we present our detailed, point-by-point responses to all your comments.
Comments 1: The authors' review of existing neural networks used for plant disease identification is somewhat superficial. They do not compare existing models with each other based on measurable criteria, nor do they compare them with the requirements for performing plant disease identification procedures. The existing models reviewed should have been presented in tabular form with a comparison of key indicators.
Response 1: We sincerely thank you for highlighting the need for a more thorough and quantitative comparison of existing neural networks for plant disease recognition. To address this concern, we have extensively revised Sections 2 (Related Work,Pages 3-5) and 4.1 (Quantitative Results,Pages 19-21).
First, in the Related Work section, we now provide a systematic review of both traditional and deep learning–based plant disease recognition methods, including key classical approaches (SVM, Random Forest, KNN with handcrafted features) and recent CNN architectures (VGG16, DenseNet, EfficientNet variants, dual-stream CNNs, and attention-enhanced models). We discuss each method’s underlying principles, advantages, and limitations, highlighting critical challenges such as class imbalance, sensitivity to illumination and occlusion, and limited interpretability. This narrative contextualizes the motivations for our SDA-CAH framework and clarifies why conventional methods and prior CNNs may struggle under complex agricultural conditions.
Second, to provide quantitative, side-by-side comparisons, we introduced two new tables (Tables 8 and 9):
Table 8 presents a direct comparison of the proposed SDA-CAH framework against three mainstream baseline CNNs (Xception, standard EfficientNet-B0, and MobileNetV2) on the PlantVillage dataset. We report not only accuracy, F1-score, and recall, but also inference speed and training time, enabling a more holistic assessment of both predictive performance and computational efficiency. This demonstrates SDA-CAH’s clear superiority across all metrics, with robust gains over baselines while maintaining low latency and lightweight model size.
Table 9 provides a broader literature-based comparison of representative state-of-the-art plant disease recognition methods from recent years. This table includes dataset, classifier type, accuracy, F1-score, recall, parameter count, and model size, allowing readers to objectively evaluate each model against the requirements of practical plant disease identification, including predictive accuracy, efficiency, and deployability in resource-constrained agricultural environments.
These additions ensure that our review is not merely descriptive but measurable and critically comparative, addressing your concern regarding tabular presentation and systematic evaluation. We believe that by combining a rigorous literature survey with empirical benchmarking on a publicly available dataset, the manuscript now clearly demonstrates the unique advantages and practical relevance of SDA-CAH over existing approaches.
We hope that these revisions provide the clarity, depth, and quantitative rigor you sought, and convincingly establish SDA-CAH as a high-performance, interpretable, and deployable framework for precision agriculture.
Comments 2: The models reviewed are of comparable quality, making it unclear what the advantage of the new model will be. Furthermore, the conclusions do not sufficiently clearly convey the advantages of the proposed solution over existing ones. They fail to highlight not only the advantages of the machine model, but also the technology's economic and energy efficiency.
Response 2: We sincerely thank you for pointing out the need to more clearly emphasize the advantages of the proposed SDA-CAH framework over existing models, not only in terms of predictive performance but also regarding economic and energy efficiency. In response, we have substantially revised Sections 4.1(Pages 19-22), 4.2(Pages 25 and 26), and 5.1(Page 31), incorporating both quantitative benchmarking and qualitative interpretability analysis, as well as highlighting computational and deployment efficiency.
Superior predictive performance:We conducted a comprehensive evaluation on the PlantVillage test set, reporting accuracy, F1-score, and recall for SDA-CAH and three mainstream baselines (Xception, standard EfficientNet-B0, and MobileNetV2). The results demonstrate that SDA-CAH achieves 99.95% accuracy, 99.89% F1-score, and 99.89% recall, outperforming Xception (99.42%, 99.39%, 99.41%), standard EfficientNet-B0 (99.35%, 99.32%, 99.33%), and MobileNetV2 (95.77%, 94.52%, 94.77%). These gains stem from the synergistic dual augmentation and class-aware hybrid sampling, which effectively enhance adaptability to intra-class variability and mitigate class imbalance, particularly for rare disease categories (<2% of the dataset). Confusion matrix analysis further confirms SDA-CAH’s balanced and robust classification across nearly all disease categories, including visually similar classes.
Interpretability and transparency:We integrated an enhanced Grad-CAM–based visualization in our framework. High-resolution heatmaps accurately localize disease-affected regions, providing actionable diagnostic insights. This demonstrates that SDA-CAH not only delivers superior accuracy but also enhances model transparency and trustworthiness, addressing the “black-box” limitation of conventional deep learning approaches.
Economic and energy efficiency:Beyond predictive metrics, SDA-CAH is designed with practical deployment in mind. The framework maintains a lightweight configuration (5.33 M parameters, 21 MB) and achieves minimal inference delay (≈70 ms per image on NVIDIA T4 GPU) without compromising accuracy. Training time is effectively reduced through the integrated augmentation and sampling strategies, and the model exhibits low computational overhead, ensuring both energy-efficient operation and cost-effective deployment on mobile and edge devices.
Comprehensive system-level advantage:The SDA-CAH framework is operationalized in a user-friendly diagnostic platform, supporting real-time inference, Grad-CAM visualization, and expert-curated remediation guidance. This end-to-end design bridges the gap between algorithmic performance and practical usability, demonstrating scalable, environmentally sustainable, and economically feasible deployment in precision agriculture.
Summary of contributions:Collectively, SDA-CAH provides a unique combination of high accuracy, balanced representation learning, interpretable outputs, and energy-efficient deployment, clearly surpassing existing methods in both theoretical performance and practical applicability. These enhancements establish SDA-CAH as a new benchmark for next-generation plant disease recognition, offering measurable advantages over prior state-of-the-art models in accuracy, interpretability, and resource efficiency.
We believe that these revisions comprehensively clarify the distinct advantages of SDA-CAH, addressing your concern and highlighting its transformative potential for both academic research and real-world agricultural applications.
Comments 3: Figures 5, 7-9 require revision. In their current form, they are practically illegible.
Response 3: We sincerely thank you for highlighting the legibility issues in Figures 5(Page 14) and 7–10(Pages 22-25). In response, we have enhanced the clarity and visual quality of all the affected figures. Specifically:
Figure 5 (System Architecture):
We improved the resolution and contrast of all components.
Node labels, arrows, and module descriptions have been enlarged and bolded for clear readability, even in printed form.
Color schemes were adjusted to ensure high contrast between different modules and data flows, emphasizing the end-to-end diagnostic pipeline and interactions between SDA, CAH, EfficientNet, and the optimization loop.
Figures 7–10 (Confusion Matrices for Xception, EfficientNet-B0, MobileNetV2 and SDA-CAH+EfficientNet-B0):
The heatmaps have been rescaled to high resolution, with a larger font size for both row and column labels.
Color gradients were optimized to enhance contrast between correct classifications and misclassifications, making subtle differences clearly discernible.
Numerical values inside each cell have been bolded and sized appropriately, ensuring that performance metrics across all classes can be easily interpreted.
These enhancements ensure that all figures are fully legible, self-explanatory, and publication-ready, providing a clear and accurate visualization of both the system workflow and comparative model performance. We believe that the revised figures now effectively communicate the model architecture, classification results, and interpretability analyses, addressing your concerns.
We hope the improvements make the figures much clearer and more informative, and we welcome any further suggestions to enhance their presentation.
Comments 4: Attention should have been paid not only to the software but also to the technical implementation.
Response 4: We sincerely thank you for emphasizing the importance of detailing the technical implementation alongside the software platform. In response, we have substantially expanded Section 3.2.6 (System Implementation,Pages 13-15) to provide a thorough account of both software and engineering aspects.
Specifically, we now describe:
Platform Architecture:
The system employs a client–server architecture with a responsive frontend built using HTML5, CSS, JavaScript, and Bootstrap, and a backend implemented in Flask.
It supports image uploads, real-time inference, Grad-CAM heatmap visualization, and expert-curated remediation recommendations, ensuring an intuitive and interactive user experience across diverse devices, including smartphones, tablets, and desktops.
Technical optimizations, such as batch processing, asynchronous request handling, GPU acceleration, and model weights preloading, ensure low-latency, resource-efficient inference (0.42 seconds per image on an NVIDIA GeForce GTX 1650).
Algorithmic Integration:
The platform operationalizes the SDA-CAH framework as a fully integrated end-to-end workflow.
Images are preprocessed (resized, normalized, color-adjusted) and inference is executed in parallel, while Grad-CAM heatmaps are generated for precise, interpretable visualization of disease-affected regions.
Predictions are coupled with actionable, expert-curated recommendations, bridging algorithmic output with practical agricultural guidance.
Deployment and Scalability:
The platform is fully containerized using Docker and employs NGINX for load balancing, SSL termination, and request routing.
Continuous integration and GPU-enabled containers support efficient, reproducible deployment across cloud environments.
Logging, monitoring, input validation, and offline caching mechanisms ensure robustness, security, and usability under real-world conditions.
By presenting these details, we demonstrate the seamless integration of high-performance modeling with engineering optimization, ensuring that the SDA-CAH framework is not only a state-of-the-art research model but also a practically deployable, resource-efficient, and interpretable system suitable for real-world precision agriculture.
We hope that these detailed technical descriptions adequately address your concerns and clearly convey the practical, scalable, and energy-efficient implementation of our platform.
Comments 5: The conclusions should include comparisons and advantages over existing solutions.
Response 5: We sincerely thank you for highlighting the need to clearly articulate the advantages of our proposed solution over existing approaches. In response, we have substantially revised Sections 5.1 (Summary of Contributions,Page 31) and 5.2 (Limitations and Future Work,Page 31) to explicitly demonstrate the comparative benefits and practical relevance of the SDA-CAH framework.
Specifically, we now emphasize that:
Algorithmic and Predictive Advantages:
SDA-CAH integrates optimized MixUp augmentation with a customized Albumentations pipeline, preserving subtle lesion characteristics, and employs a class-aware hybrid sampling mechanism to mitigate class imbalance.
On the publicly available PlantVillage dataset, SDA-CAH built upon EfficientNet-B0 achieved 99.95% accuracy, 99.89% F1-score, and 99.89% recall, outperforming state-of-the-art baselines including Xception, standard EfficientNet-B0, and MobileNetV2.
These results demonstrate not only higher overall accuracy but also enhanced recognition of rare disease categories, addressing limitations in existing solutions.
Interpretability and Practical Utility:
Grad-CAM visualizations provide high-resolution, disease-specific heatmaps, enabling precise localization of pathological regions and actionable diagnostic insights.
The integrated platform delivers real-time inference, visual explanations, and expert-curated remediation recommendations, lowering technical barriers for practical agricultural deployment.
Economic and Energy Efficiency:
The SDA-CAH framework achieves state-of-the-art performance with only 5.33M parameters and a lightweight storage footprint (~21 MB), with minimal inference delay (~70 ms per image).
This combination of high accuracy, low computational overhead, and fast inference makes the system cost-effective, environmentally sustainable, and suitable for deployment on mobile and edge devices.
Comparative Advantage over Existing Solutions:
Beyond accuracy, SDA-CAH demonstrates superior F1-score, rare disease recognition, interpretability, computational efficiency, and energy economy compared with prior methods, establishing a new benchmark for next-generation precision agriculture applications.
Future Directions:
We explicitly acknowledge current limitations, including lack of cross-dataset or field validation and reliance on the laboratory-biased PlantVillage dataset, and outline plans for heterogeneous field dataset validation, lightweight mobile deployment, multimodal data integration, GIS-based treatment recommendations, and IoT-enabled real-time monitoring, to further enhance robustness, practical utility, and global applicability.
Collectively, these revisions clearly communicate both the algorithmic superiority and practical deployment advantages of SDA-CAH, addressing your concerns and highlighting its comprehensive value over existing plant disease recognition solutions.
We sincerely thank you for the thoughtful feedback and kind recognition of our work’s contributions. Your valuable insights have greatly strengthened the manuscript in terms of scientific depth, clarity, and practical significance. We truly appreciate the time and effort you invested in reviewing our study. Should there be any remaining issues or areas requiring further refinement, we would be more than willing to make additional revisions based on your guidance.
With kind regards,
Ran Wang
Shandong University of Technology
Zibo, Shandong, China
Email: 24505040727@stumail.sdut.edu.cn
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsComments are addressed.
