A Rice-Mapping Method with Integrated Automatic Generation of Training Samples and Random Forest Classification Using Google Earth Engine
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper introduces an enhanced method for mapping rice planting areas using Landsat imagery and the Random Forest algorithm on Google Earth Engine (GEE). The proposed approach (LR), integrates remote sensing data with phenological analysis to automate training sample generation and improve classification accuracy. Overall, the article demonstrates scientific soundness, a well-structured methodology and properly validated results. I have only a few minor observations:
Title: I suggest removing the word "algorithms" from the title, as it is only an algorithm.
Keywords: It is recommended not to repeat keywords already contained in the title.
Lines 15-16. The text "the extraction effect... is not good" has a minor grammatical inconsistence (consider "the extraction accuracy … is not optimal"). The term “good” is subjetive.
Line 26. “infor-mation”
Line 53. “NDVI” stands for “Normalized Difference Vegetation Index”, not “normalized interpolated vegetation index”.
Line 136. Consider writing the interval [-4,6] in an alternative form instead of "-4-6", as the latter may be confusing.
Line 261. This text is confusing: “F is the frequency of each identified rice pixel counted for each identified rice pixel at the transplanting stage to be recognized as rice.”
Line 484. “climate-assisted integration of automatic generation of training samples" can be simplified to "climate-assisted automatic generation of training samples", avoiding repetition of the same concept.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe focus of this research is on the use of remote sensing techniques to reliably detect the extent of rice production over an extended period.
The strength of this work is in how various techniques are combined, building on previous efforts along a similar line, to produce results that are more reliable and with a higher resolution when compared with previous efforts.
The process that that the authors developed is more of synthesis of previous efforts. This is extremely useful because the paper demonstrates the point that these efforts are combined into a single system of production, the results are improved when compared with relying on any one of the components. It also outlines where there is perhaps further potential for improvements. In so doing this paper advances the use of remote sensing, particularly as it relates to Landsat and the locations of rice paddies, toward more reliable land cover and land use detection. I also think the authors are correct in their assertion that this is an important issue for policy making and environmental impact assessment.
I thought they did a good job of integrating the works of others along similar lines leading up to this point.
I think their conclusions are consistent with the methods they deployed and the results they produced.
The references are appropriate.
I have a few suggestions for minor changes, please check the attachment. I found the work very interesting and methodical in approach.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsSeveral aspects must be highlighted about the RF algorithm:
- ntrees and nfeatures has been specified, but no finer tuning has been provided. For instance, different settings might lead to better (or worse) results;
- RF model type, i.e. binary classification (rice or not rice) or regression is not specified;
- there is no confusion matrix provided for the dataset, meaning that false negative and false positives are not discussed;
- RF can also provide some insights, such as entropy or Gini's gain of information per each variable but this analysis is missing so readers have little data on feature importance (like NDVI or other variables might have a greater impact on the prediction);
- authors did not provide information regarding the balancing of the dataset among classes of the prediction label;
Other aspects are related in general to the research methodology:
- ground truth information has not been discussed, like its source and reliability;
- high accuracy levels often mean overfitting or, maybe, simply that there are one or more features that can be used alone for a hard-coded value-based classification (if/elif/else statements);
- reviewer did not find any reference to how the dataset has ben divided into train and test data, or if any cross-validation has been tried on the results;
- additional metrics should be provided to understand why the algorithm can properly predict values/labels and, on the other hand, what are the conditions in which it fails to provide a correct prediction;
- authors should discuss, after evaluating data and results, if RF has really provided a simple correlation of of a complex scenario or if the prediction was simplier in most of the scenarios and harder in other scenarios;
- spatial analysis of the errors could also be an interesting aspect to discuss;
- vegetation indices might not be enough too, for example there could be soil variables that have not been taken into account (texture, water content, etc) but that could have greater impact in different datasets or other regions.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsPoint 1: the square root of the number of features is actually a "rule of thumb" when dealing with RF, but it should be verified by looking at OOB errors and only after that it can be possible to determine the convergence plateau for ntrees and the best number for nfeatures (which has a maximum achievable value and does not converge). Therefore, fine-tuning procedures are still missing;
Point 2: since it is a classification on two possible labels, false negatives and false positives on the test dataset are necessary and must be provided in a confusion matrix; this affects also the meaning of response 3;
Point 4: authors should investigate the "information gain", which is a (in case of Gini, old but gold - and in some cases also goes after mean decrease accuracy) but still solid statistical methodology that helps us understand which variables help explain outcomes the most. Statistical software used for RF, but also programming languages and environments like R can provide authors a very intuitive plot in which the information gains are listed for each variable of the dataset from the most important to the least. An example of the outcome is here: https://www.mdpi.com/2072-4292/5/6/2838
Point 7: if the score is highly associated with some parameters that can be addressed by hard-coding, then the discussion of the classification accuracy on those parameters that cannot be handled that way would be even more meaningful. This is especially what comes out of confusion matrix, in which it is possible to understand which conditions lead to false positives or negatives and what are responsible for right results even without a RF;
As a general concept, RF is a statistical tool that lets us classify things under a well-known domain. Therefore, if we want to predict a class, it is important to understand which cases can be already classified even without a RF and to provide information regarding which features are really helpful to show why a RF is justified.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors addressed most of the comments made in previous rounds. However, since part of the suggestions for improvements have been treated as study limitations that will be covered in future research works, every aspect that has been taken out of this current work should be explained in the discussion section (I'd say limitations go there, while strategies to overcome limitations in future works should go in conclusions).
Author Response
Please see the attachment.
Author Response File: Author Response.pdf