Next Article in Journal / Special Issue
STAR-3D: A Holistic Approach for Human Activity Recognition in the Classroom Environment
Previous Article in Journal
Leveraging the TOE Framework: Examining the Potential of Mobile Health (mHealth) to Mitigate Health Inequalities
Previous Article in Special Issue
Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images
 
 
Article
Peer-Review Record

A Cloud-Based Deep Learning Framework for Downy Mildew Detection in Viticulture Using Real-Time Image Acquisition from Embedded Devices and Drones

Information 2024, 15(4), 178; https://doi.org/10.3390/info15040178
by Sotirios Kontogiannis 1,*, Myrto Konstantinidou 2,*, Vasileios Tsioukas 3 and Christos Pikridas 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Information 2024, 15(4), 178; https://doi.org/10.3390/info15040178
Submission received: 26 February 2024 / Revised: 20 March 2024 / Accepted: 22 March 2024 / Published: 24 March 2024
(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

- Increase spaces between figure titles and paragraphs that follow them.

- A part of Figure 2 and Table 1 are outside of the text margins.

- In Figure 3 (a) and (b) subtitles are not aligned. Same comment applies for Figure 4.

- The authors should provide example images of their dataset, alongside examples of annotated images.

- Is the annotated dataset the authors created available online? What kinds of images can be found in it? Are all of the images taken under the same weather conditions, or is there some variation? What effects do rain, wind, etc. have on the images and on the final results?

Author Response

please check the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper describes a system for determining the state of health of a vineyard. The system is meant to work autonomously with minimal supervision. As such, it is an impressive work. The paper, however, focuses on the aspect of training and evaluating the performance of different neural networks in this specific application.

The remarks I have are the following:

I understand that the paper focuses on the neural networks, but it would be nice to read about some difficulties you mention regarding the UAV flight control and how you solved it and what were the results. Also a more elaborate description of the autonomous nodes would be interesting. Maybe at least a more detailed photo? Details about the whole WiFi system seem also to be interesting but you do not provide any information about that.

Maybe some other paper to describe these aspects of the system?

 

The more important remarks are the following:

In figure 4, the drone image inference is 4c, not 4a as written in the text.

I think you got the formula 1 the other way round.

formula 2: what is "m"? what is r_i, r_{i+1}? what is P_intp? i is supposed to be the index of boxes

formula 3: what do you mean by "target boxes"? What are those? Also a possible typo in the d|b^p,b^{gt}|

You do not provide the formula for object loss.

formula 4: what is x? what is i?

Table 1: wrong image size for Resnet50

below table 1: the IoT camera is 2 MP, therefore it is not able to capture 4128x4128 pixels images. Do you mean that you performed two separate trainings for embedded models?

The conclusion about the number of training images seems to be totally unsupported by the results presented in the paper.

The references are not fomatted consistiently

Comments on the Quality of English Language

Some sentences are difficult to understand, there are some repetitions and spelling errors.

Author Response

please check the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper describes a method to detect mildew in vines, using CNN over aerial drone images.

The paper is well organised and clear.  The methods followed are state of the art and adequate to the objectives.

Here are a few suggestions and remarks, to hopefully help improve the paper.

  - The YOLO network is typically fast but requires many epochs to learn. So it may be unfair to stop training of YOLO nano at 100 epochs. The authors should at least mention that in the paper or justify why truncate training at 100 epochs.
 
  - The methods used to train the NN seem to show good performance detecting mildew in the images captured. However, there is the additional problem of capturing images of the right spots in the vineyard. The method will only work if the drones capture images of diseased leaves. Even if that is out of the scope of the current project, that should be discussed as a possible shortcoming of the approach.  Are there any procedures to guarantee that the drones will capture images of the disease if it is present in the vineyard?
 
  - The hardware and software required to apply the process seems quite expensive and clumsy. The paper could describe in which circumstances the method is justified or not.
 
  - The paper mentions a significant outburst of mildew in 2023. It would make a stronger case for the method if it was discussed how the method proposed could contribute to mitigate the problem and its possible impact.
 
  - The paper conclusion is quite extensive.  It could be shorter and more straight to the point, highlighting the main advantages and limitations of the method proposed.
 
  - Figure 1. could be adjusted to show clearly what is the output of each phase.
 
  - Equation 1 needs to be revised.

 

Comments on the Quality of English Language

  - There are many typos and awkward sentences along the paper. There are extra letters, numbers and even words.
  So the manuscript needs careful proof reading by someone with good command of English.

  - Latin words should be in italic.

Author Response

please check the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors addressed all of my previous comment and significantly improved their manuscript.

Author Response

Thank you for your time and effort in reviewing our manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper is much improved now. There are some issues that still need addressing, however.

Regarding IoT node:

Are you using ESP32? If yes, why not say so? If not - why not say it even more clearly?

OV2640 has parallel data transfer. SCCB is just an I2C configuration channel. I would remove this sentence.

Do you power it from 12V or 4,2V battery? You say 4,2V but then you mention using 12V to 3,3V converter. Neither of them can be "directly powered" by a 6V solar panel.

About resizing images - I guess that the method of resizing would significantly influence the results. What about that?

Figure 4d - good idea, but the images are too small.

link in [62] is not accessible.

Regarding the

"Based on the author’s experimentation, the ResNet-50
model achieved the best accuracy over inference time results if trained on less than 500 annotated images (small datasets). ResNet-101 models can be used as inference models, achieving better precision over inference time values if trained on 500-1500 annotated images (medium datasets), while for big datasets, ResNet-152 models are preferred to infer significantly better accuracy over inference time results."

part - the problem is that you never mentioned the number of training images before. So please say a few words about that earlier on, in the experiment description, or remove it completely.

 

Comments on the Quality of English Language

The legibility is much improved.

Author Response

The paper is much improved now. There are some issues that still need addressing, however.

Response: Thank you very much for your comments that help us improve our manuscript
Regarding IoT node:

Comment 1: Are you using ESP32? If yes, why not say so? If not - why not say it even
more clearly?

Response: Yes, we are using ESP32. Nevertheless, we avoided mentioning it and focused on the technical specs of AI-thinker ESP32 cam, since brands and model names change over time. Appropriate mention has been added to lines 191-192.


Comment 2: OV2640 has parallel data transfer. SCCB is just an I2C configuration channel.
I would remove this sentence.

Response: The mention has been removed from line 193.

Comment 3: Do you power it from 12V or 4,2V battery? You say 4,2V but then you mention
using 12V to 3,3V converter. Neither of them can be "directly powered" by a
6V solar panel.

Response: We are powering it from a 4.2V 18650 battery that, in turn, is directly connected to a 6V/2W solar panel. The 12V to 3.3V conversion is needed since the 6V panel is also directly connected to the 3.3V pin of the ESP32 module (both battery, panel, and 3.3V pin of the ESP32 are connected in parallel). 12 to 3.3V step-down has been used to tolerate voltage peaks of up to 8.7V from the solar panels.

Comment 4: About resizing images - I guess that the method of resizing would significantly influence the results. What about that?

Response: Resizing images influences the results. Nevertheless, the experiments have been uniformly performed using 640x640px images for cloud-based models and achieved satisfactory mAP results. Bigger images have been used for mobile models to provide better results.

Comment 5: Figure 4d - good idea, but the images are too small.

Response: Figure 4d has been removed from the subfigure environment and added as a separate Figure 5, increasing its space and size.

Comment 6: link in [62] is not accessible

Response: I am so sorry for this inconvenience. The problem has been fixed. Please check it again.

Comment 7: Regarding the

"Based on the author's experimentation, the ResNet-50
model achieved the best accuracy over inference time results if trained on
less than 500 annotated images (small datasets). ResNet-101 models can be
used as inference models, achieving better precision over inference time
values if trained on 500-1500 annotated images (medium datasets), while for
big datasets, ResNet-152 models are preferred to infer significantly better
accuracy over inference time results."

part - the problem is that you never mentioned the number of training images
before. So please say a few words about that earlier on, in the experiment
description, or remove it completely.

Response: Yes you are right. This is more of an intuition of mine that occurred during experimentation. Our experimental dataset was mentioned previously on lines 331-340 (second paragraph from the end of section 3). Nevertheless, since this is an authors’ intuition that is not supported by experimental results, it has been removed.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

I believe the paper has been greatly improved. My main concerns have been well addressed, so the paper in my opinion deserves publication.

Author Response

Thank you for your time and effort in reviewing our manuscript

Back to TopTop