Next Article in Journal
A Self-Adaptive Mean Shift Tree-Segmentation Method Using UAV LiDAR Data
Previous Article in Journal
Spectral Imagery Tensor Decomposition for Semantic Segmentation of Remote Sensing Data through Fully Convolutional Networks
 
 
Article
Peer-Review Record

Geometry Aware Evaluation of Handcrafted Superpixel-Based Features and Convolutional Neural Networks for Land Cover Mapping Using Satellite Imagery

Remote Sens. 2020, 12(3), 513; https://doi.org/10.3390/rs12030513
by Dawa Derksen 1,*, Jordi Inglada 1,2 and Julien Michel 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2020, 12(3), 513; https://doi.org/10.3390/rs12030513
Submission received: 13 November 2019 / Revised: 29 January 2020 / Accepted: 31 January 2020 / Published: 5 February 2020

Round 1

Reviewer 1 Report

The paper is intended to address an important issue in RS image classification: sparsely labelled training data. It has developed a process of using the histogram of auto context classes in superpixels (HACCS) to include contextual information in image classification and conducted a comparison of different classification approaches (CNN, CNN combined with hand-crafted features, semantic segmentation network, and pixel-wise) on two high spatial resolution image datasets. The paper is not very well structured and methodologies and results in some parts of the paper are mixed together. It is also excessively lengthy and should be shortened by removing duplicates and unnecessary/irrelevant information. Moreover, there are some technical issues that should be addressed.

[L.2]: “…Contextual features can be discriminating…”: This is not correct. You can incorporate contextual features to a pixel-wise classification method by just sliding a kernel on the image and calculate your contextual features of interest for the centered pixels. Therefore, this is not the main reason not to use pixel-wise approaches. [L.35-85]: Please shorten this part. Very basic information was provided in these lines. Such information is redundant given the prior knowledge of readers of such paper [L.68-70]: “…these neural networks are able to achieve very strong performances…”: Please change this part to “these neural networks may achieve very strong performance….”. In some cases, these techniques are not able to improve the generalizability of the model, especially in the case of insufficient labeled data. [L.122-123]: “Moreover, as their design is not based on an end-to-end optimization scheme, they might be less sensitive to incomplete training data.” If you have any reference for this statement, please cite it; otherwise, please avoid making such a statement. However, the fact is that, a CNN does not optimize the features! A CNN learns discriminative features in either a supervised or unsupervised scheme. This can also be done if you use hand-crafted features and use an autoencoder MLP (which can be followed by a classifier, a Softmax, SVM, RF, etc.) to use representative/discriminative features. Section 2 is very wordy. Please reorganize and shorten this section as much as possible. For example, L.304-317 are redundant. In addition, this is a classification research paper, so the literature review should include previous research in this field to judge how this research can be distinguished from them. [L.353]: “Patch-based methods involve taking a standard CNN architecture like ImageNet [10],..”: ImageNet is a database not a CNN architecture. I do not understand the point of Section 3.1. What is the main idea for this long section? It starts from what CNN is, a sample of CNN is shown, and then an experiment in which a CNN applied is represented. In the last paragraph, it is discussed that the training data is noisy, affecting the classification accuracy. First, why such an experiment is given her not in the results section? Second, the authors in L.375-378 state the following:

Secondly, there are errors in the training labels, which are due to the fact that these training data sets are often out of date, and hence do not take into account the land cover changes that may have occurred. This example shows how biases in the training data can cause visible errors in the result, in areas that are entirely absent from the training data. Methods that rely solely on the training data to learn every single aspect of a classification problem can be subject to generalization errors, if the training data is not sufficiently representative of the problem”. Based on which study, the authors claim that the noisy training set problem is specific to CNN? Moreover, based on which study, the authors claim that if a noisy training set can be handled using hand-crafted features?

[L.381]: “For this reason, it will be referred to as the fully-convolutional network in this paper.” Please revise this sentence. The reason that they are called fully convolutional NNs is that all the neorons are fully connected, rather than sparsely connected. Based on which strategy you performed sampling? Please report the values chosen for the parameters of the classifiers used and describe how/why you chose these values. Table 3 presents the accuracies resulting from the classification maps. However, several of them are very close to each other, and thus it is very likely that they are not statistically significantly different. The authors would apply a statistical test to conclude if the accuracies are statistically different. One of the main problems in this paper is to apply a CNN to a classification (semantic labeling) problem. In the remote sensing community (at least in recent years), this is rarely done because CNNs are originally proposed for scene recognition not semantic labeling. As a result, without any experimental study, it could have been concluded that they would not perform well, unless a sematnic segmentation network is applied, like U-Net used in this study. Careful English editing is needed to correct some grammar issues and typos

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The article is well thought and well presented, however requires some minor corrections before being accepted for publication. Some comments come in the attached file.

Comments for author File: Comments.pdf

Author Response

The article is well thought and well presented, however requires some minor corrections before being accepted for publication. Some comments come in the attached file.

Thank you for your review. The issue of using "VHSR" to designate SPOT-7 imagery has been corrected. Presently, the images or experiments are simply referred to as "SPOT-7" or "HSR". Moreover, the other minor typos have been corrected.

Reviewer 3 Report

The authors are trying to tackle a complex issue in land cover mapping from high-resolution remote sensing image data.

I appreciate the effort that was made by authors but overall, the presentation of the study is quite poor. For example, the Introduction section, consisting of too many short paragraphs, has a very low degree of readability. As a consequence, I suggest a major revision of this section to highlight the research gaps and research problem to be addressed by this study. The rest sections of the manuscript are also needed to be restructured.

In terms of the technical aspect, both HACCS and D-CNN approaches were tested on the Sentinel-2 and SPOT-7 data but the resulting thematic and geometric accuracies were different. What are the main reasons for this? What are your general recommendations when we apply such approaches to various remote sensing image data?

Author Response

I appreciate the effort that was made by authors but overall, the presentation of the study is quite poor. For example, the Introduction section, consisting of too many short paragraphs, has a very low degree of readability. As a consequence, I suggest a major revision of this section to highlight the research gaps and research problem to be addressed by this study. The rest sections of the manuscript are also needed to be restructured.

Thank you for your comments. Indeed, the sections 1 and 2 were too long and "wordy", so we have made a significant effort to reduce them while keeping the essential information for the reader to comprehend the proposed method. In particular, some of the very basic examples were removed from the introduction, as well as many of the reduncancies between the introduction and the other sections. Section 2 was shortened down (removal of the overview in section 2.4) to make place for a more detailed literature review that highlights the difference between the proposed and existing research. The experimental results and analysis that was previously presented in section 3 as an illustration of the limitations of CNN architectures was moved to the results section. We hope that these changes make the paper overall more readable.

In terms of the technical aspect, both HACCS and D-CNN approaches were tested on the Sentinel-2 and SPOT-7 data but the resulting thematic and geometric accuracies were different. What are the main reasons for this? What are your general recommendations when we apply such approaches to various remote sensing image data?

Indeed, the two problems present very different spatial, temporal, and spectral resolutions, as well as a different nomenclature of classes. However, the conclusion that HACCS provides a higher degree of geometric accuracy than the CNN method can be observed on both data sets. This study suggests that for the higher spatial resolution images with fewer features (SPOT-7) the use of CNNs may be preferred, under the condition that a higher computational cost and lower geometric accuracy is acceptable. However, if HACCS is applied to such images it must be done with care, as if the number of contextual features largely exceeds the number of pixel features, it may result in a degradation of the geometry. For the HSR imagery (Sentinel-2) with more features, HACCS seems to be the preferred solution, seeing as it provides a similar performance with a lower computational cost.

This was added as a paragraph in the conclusion section.

Round 2

Reviewer 1 Report

The authors have addressed most of concerns from reviewers. I have two suggested changes for the revised manuscript.

Line 60, the sentence "this problem is known as semantic segmentation in ..., but is usually referred to as classification by the ...". I don't think this claim is right. "Classification" in RS is much broader than "semantic segmentation". Please correct this.  

My second concern lies on the quality assessment and accuracy assessment. As the authors already said there is not quantitative accuracy assessment in the comparisons, instead, only cartographic quality is checked visually. The conclusion can be subjective depending on which parts were checked visually. This limitation should be stated clearly 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Dear authors, thanks very much for your effort to address my issues. I am now fine with the revised manuscript.

Author Response

Thank you very much for your comments.

Back to TopTop