Next Article in Journal
Detection of AIS Closing Behavior and MMSI Spoofing Behavior of Ships Based on Spatiotemporal Data
Next Article in Special Issue
Estimation of Hourly Rainfall during Typhoons Using Radar Mosaic-Based Convolutional Neural Networks
Previous Article in Journal
Training Data Selection for Annual Land Cover Classification for the Land Change Monitoring, Assessment, and Projection (LCMAP) Initiative
Previous Article in Special Issue
Inference in Supervised Spectral Classifiers for On-Board Hyperspectral Imaging: An Overview
 
 
Article
Peer-Review Record

Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images

Remote Sens. 2020, 12(4), 701; https://doi.org/10.3390/rs12040701
by Jing Zhang 1,†, Shaofu Lin 1,2, Lei Ding 3,* and Lorenzo Bruzzone 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2020, 12(4), 701; https://doi.org/10.3390/rs12040701
Submission received: 15 January 2020 / Revised: 8 February 2020 / Accepted: 17 February 2020 / Published: 24 February 2020
(This article belongs to the Special Issue Deep Neural Networks for Remote Sensing Applications)

Round 1

Reviewer 1 Report

This paper proposes a multi-scale context aggregation architecture that exploits context-related information using a deep learning approach.  The proposed approach is interesting and provide results, show better performance of the proposed approach compared with other previously proposed schemes.    However, some modifications can be done to improve the paper.

In figure 2 it would be necessary to mention the kernel dimensions, as well as to explain the operations of the included following the boxes named “convs” In figure 2 it is not clear how the concatenation must be performed.  It would be useful to include some figures showing how the output is estimated.  This is important because the network output becomes the input of the network given in figure 3. It would be convenient to show the operation of the block shown in figure 3 indicates as H/n x W/n. Are the feature matrix of functional blocks,  Also how the outputs are concatenated. The same observation is valid for figure 4 and figure 5. In all situations, if the outputs of all blocks in figures 1-5 can be expressed mathematically, the system would be easier to be understood.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors report on an approach for remote-sensing image segmentation using an architecture helping to maintain spatial detail and localization information. In comparison to other approaches they achieve competitive results. 

Assessment: The manuscript makes a very good impression with well-written text and high-quality and instructive figures and tables. My comments here are therefore not on the technical side but more referring to making this manuscript more accessible for a wider RS researcher community who are interested in the segmentation results not for the sake of segmentation but for extracting information to conduct land-use/land-cover studies, for example.

The abstract is to the point and the introduction provides the necessary background settings.

However, I would here suggest to spend some more words on a few details, such as the meaning of "multi-scale context aggregation (architecture)" [L45ff] which re-occurs throughout the manuscript without actually being explained in some more detail to make it accessible for researchers not focusing on deep learning but who are interested in making use of segmentation results. As a core component of this research, it would be helpful to bring this to a broader attention of potential readers. The same goes for "adaptive spatial pooling" which could benefit from a short explanatory paragraph. You refer to aerial images in particular but I would be interested to learn about your take on remote-sensing image segmentation in a more general context, including satellite data. Any comments on that?

Would you have some recommendations for researcher using such an approach for systematic land-use/land-change studies? Is there any usable output they might be able to make use of? Is it feasible to implement?
  You added a list of abbreviations but it does not contain all abbreviations by far. You might want to update this.

You added a number of "arXiv" references. Are these exclusively published on arXive.org? Then they are not peer-reviewed. How do you judge their soundness?  If not exclusively published on arXive, citing the primary source would probably be more appropriate.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The paper addressed semantic object segmentation in satellite images, especially challenges in segmenting ojbects with unsual sizes and their details. To enhance the ability of an encoder-decoder style HRNet, two more modules Adaptive Spatial Pooling and Reasoning modules were proposed. Results on two city datasets are impressive, especially wrt class "car" on both datasets. With such good potentials on segmenting small objects the reader would be much interested in checking the proposed method's evaluation on bigger datasets with more classes.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors attended the reviewer requirements, then the paper can be accepted in its actual form.  

Back to TopTop