Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Synthetic Data-Driven Methods to Accelerate the Deployment of Deep Learning Models: A Case Study on Pest and Disease Detection in Precision Viticulture

Computers 2025, 14(8), 327; https://doi.org/10.3390/computers14080327

by Telmo Adão^1,2,*

, Agnieszka Chojka¹

, David Pascoal^1,3

, Nuno Silva^1,3

, Raul Morais^1,3,4

and Emanuel Peres^1,3,4

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Computers 2025, 14(8), 327; https://doi.org/10.3390/computers14080327

Submission received: 20 June 2025 / Revised: 3 August 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications 2025)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper the authors proposes a synthetic method for deploying datasets, whilst using Deep Learning, in the context of image analysis of pest and disease control, in agriculture.

I found this paper to be both informative and well structured. To my understanding, the authors propose a combination of LLM with image analysis, in providing annotation for the synthetic dataset, which is then used to train the image model. This approach is novel and on the cutting edge of hybrid-model development, which I believe is an important emerging field; especially considering the popularity of LLM's for prompt generation and label retrieval, for this synthetic data.

In the author's Results I believe they compare the result of their proposed synthetic hybrid- model with the sole-image unseen model. This yields improvement. I would ask the author's to clarify if this improvement is above and beyond the other model, however, or if it is the absolute accuracy, as I had a hard time parsing this from Table 4. I would also ask the authors to interpret the result of Table 5 further. To elaborate, Precision of near 1.0 is very good and artefacts or collection errors should be considered. Of course, it is possible that it is just generally very accurate, but a line or two of further analysis could clarify it.

Finally, would the authors be able to compare the performance of their model against some others, or other studies that have used similar methods? This does not have to be in the exact same domain, but a table that shows the authors result with that of other studies to benchmark against, instead of just comparing to the original label ground truth. I understand that this may be difficult, given the bespoke nature of the method, but just referencing some other results that use hybrid-models could be helpful.

Author Response

Dear reviewer,

Thank you for your feedback and time. Please, find attached our response letter to your comments.

Best regards,
Telmo Adão
(on behalf of the authors)

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

A manuscript entitled "Synthetic Data-Driven Methods to Accelerate the Deployment of Deep Learning Models: A Case Study on Pest and Disease Detection in Precision Viticulture" addresses a well-recognized challenge in machine learning: the bottleneck posed by data scarcity, particularly in domains where data collection is time-sensitive or labor-intensive.

The authors propose leveraging synthetic data to accelerate the initial development and deployment of deep learning (DL) models in such contexts, with a focus on precision viticulture. The manuscript outlines two main methods for generating synthetic data—rule-based image processing and generative diffusion models—suggesting that these can be used separately or in tandem. While the strategy is well-motivated and timely, especially given the growing interest in synthetic data across domains, the paper raises several critical points that deserve scrutiny.

First, it claims that synthetic datasets “significantly accelerate” early-stage DL model development, but the manuscript does not quantify this improvement or clarify what “early-stage” entails. Are the models competitive with those trained on real data? What metrics were used to evaluate performance? Without such details, it is difficult to assess the true impact of the proposed methods.

Second, while the dual approach to synthetic data generation adds flexibility, the manuscript does not explain how these methods were validated or how realism and domain fidelity were ensured. In agricultural applications, especially involving disease and pest detection, visual nuance is critical. Overreliance on synthetic imagery risks missing subtle features important for robust classification.

Moreover, although the authors highlight “a couple case studies,” the scope of validation appears limited. It remains unclear whether the approach generalizes well to other crops, conditions, or imaging modalities.

Finally, while the conclusion mentions the necessity of domain-specific fine-tuning, this caveat underscores a limitation: synthetic data may only partially address the underlying challenge. The value of the synthetic-first approach hinges on how much it reduces the subsequent cost and effort of domain adaptation—something not detailed in the paper. However, the proposed use of synthetic data in this paper is innovative and aligns with urgent practical needs. The manuscript lacks sufficient methodological detail and empirical evidence to fully convince the reader of its effectiveness and generalizability. A more rigorous evaluation and clearer exposition of trade-offs would strengthen the argument considerably.

Some minor mistakes:

Line 236: Avoid “etc.” in the scientific paper;
While the summary claims “promising performance,” it provides no metrics to support this. Including numerical results (e.g., accuracy, precision, recall, F1-score) or at least relative improvements compared to baseline models would help substantiate the claims and make the work more persuasive.

Author Response

Dear reviewer,

Thank you for your feedback and time. Please, find attached our response letter to your comments.

Best regards,
Telmo Adão
(on behalf of the authors)

Author Response File: Author Response.pdf

Article Menu

Synthetic Data-Driven Methods to Accelerate the Deployment of Deep Learning Models: A Case Study on Pest and Disease Detection in Precision Viticulture

Further Information

Guidelines

MDPI Initiatives

Follow MDPI