Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

Remote Sens. 2021, 13(3), 368; https://doi.org/10.3390/rs13030368

by Christopher A. Ramezan^1,*

, Timothy A. Warner²

, Aaron E. Maxwell²

and Bradley S. Price¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2021, 13(3), 368; https://doi.org/10.3390/rs13030368

Submission received: 28 December 2020 / Revised: 16 January 2021 / Accepted: 18 January 2021 / Published: 21 January 2021

(This article belongs to the Special Issue Remote Sensing Data and Classification Algorithms)

Round 1

Reviewer 1 Report

This study tries to investigate the impact of sample size on high resolution image classification. It is an interesting topic, and the manuscript is well-written. I suggestion accepting the paper.

However, I suggest the authors to add more details about how the overall accuracy is calculated in Section 2.8.

Author Response

We thank you for your careful review of our manuscript, and the thoughtful and insightful comments, which have helped strengthen the paper considerably.

Please see the attachment for our responses.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript “Effects of Training Set Size on Supervised Machine Learning Land-Cover Classification of Large-Area High Resolution Remotely Sensed Data” explores how choice of machine learning model and sample size influence model accuracy and performance. This study is unique in that it widely varies the number of samples used in each model and the subsequent model’s classification accuracy. Overall, the paper is in good shape and likely only requires relatively minor edits. Below are some suggestions for improvement:

Why build a more complex model to account for multiple classification types when a simpler model schema (i.e. binary) would work? This paper is mainly focused on differences between models, model accuracy, and sample size. You could have modeled forest versus non-forest and had largely balanced models in term of training size. Instead, you have largely imbalanced models in terms of training data per class.
I understand that building large training datasets is really time consuming. I wonder if 10,000 data points is a large enough sample size to get at differences between models. Random Forest (RF) especially tend to take large amounts of time and can essentially fail when sample sizes are over 1,000,000. For this reason, I have found that other boosted regression tree models (i.e. Xgboost) outperform RF.
I understand that you’re using machine learning models built/applied using R because it is commonly used. I think that at some point in the manuscript (likely the discussion) that you should explain how your choice of program influences the model accuracy and application speed. You could have used the same “families” of machine learning models outside or within R that might have had different results.
Using only four training data points for some classes is problematic. I’m not sure that you should really start at such low training datasets, because of inherent problems with building any models with such low numbers. Why not just start at the point where you have a minimum sample size of 15+ for the class with the smallest number of training data points?

Line X Line comments

Line 85: Please include more information about Geobia in the introduction and potentially in the abstract. You only begin to talk about this important topic that is widely used in this paper in the last paragraph of the introduction.

Line 257: Please replace “decisions” with “decision”.

Line 391: Nice figure!

Line 499: When you use such a low sample size you need to make sure that readers understand that accuracy is partially because of chance alone. Some models are more accurate because the samples contained in them are more representative. You could get some training samples that are close to outliers that throw off the models. How did you deal with outliers? Did you attempt to use any data filters to get rid of outliers? Additionally, how did you deal with mixed pixels? For example, riparian systems likely have elements of all the different classifications. Because you use higher resolution data perhaps this isn't quite as large of an issue.

511-515: Label the images? Would it make sense to zoom in more so that the area around the lake in both images would be visible? Zooming out so much makes it difficult to see anything in the original high resolution image.

Lines 586-599: The differences in processing time that you found are partially because you used R to implement all models. Models could be run more quickly or by that make some of this discussion somewhat moot. I think that you need to indicate this somewhere in discussion. Additionally, accuracy could somewhat change depending on what packages you use.

613-617: Up to what point (i.e. what sample size?) do individual samples in smaller sample sets may have more of an effect on classification performance? When do increases in accuracy become negligible? Or, are you just suggesting that you throw all samples into the models regardless of accuracy improvements due to sample size? Is there a per class minimum sample size that would be a good starting point?

630-632: Is this really an insight? You're just going to trend towards the means once you get a large enough sample size. One thing that your study doesn't address is how to deal with outliers in which the sample size is always low. Perhaps it is out of the scope of this study?

634: You should also mention somewhere that computation time is strongly driven by how models are built and applied; i.e., what scripting language they programs that build and apply the models written in, how the programs are compiled, and what scripting language applies the programs. Enabling multiprocessing on large computer systems will also greatly speed up the processing and can reduce the importance of which program is building and applying the models. Your discussion about program speed is mostly important if users are using the same setup as you within R.

641-652: I think that you could get at some of the influence that Geobia has on models by looking at variable importance. Are there variables that couldn't be done by a pixel based approach that are highly important? If there aren’t, then your results may be largely mirrored by a pixel-based model classification.

676-679: This is only true because of the gradient boosted model (GBM) you applied and your relatively "small sample size". Other gradient boosting packages (e.g. Xgboost) have similar run times and can handle much large samples in comparison to Random Forest (RF). The speed of RF decreases precipitously at some point depending on the sample size and number of variables used.

Author Response

We thank you for your careful review of our manuscript, and the thoughtful and insightful comments, which have helped strengthen the paper considerably.

Please see the attachment for our responses.

Author Response File: Author Response.docx

Article Menu

Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI