Next Article in Journal
Seasonal Flow Forecasting Using Satellite-Driven Precipitation Data for Awash and Omo-Gibe Basins, Ethiopia
Next Article in Special Issue
Low-Rank Constrained Attention-Enhanced Multiple Spatial–Spectral Feature Fusion for Small Sample Hyperspectral Image Classification
Previous Article in Journal
MKANet: An Efficient Network with Sobel Boundary Loss for Land-Cover Classification of Satellite Remote Sensing Imagery
 
 
Article
Peer-Review Record

Strip Attention Networks for Road Extraction

Remote Sens. 2022, 14(18), 4516; https://doi.org/10.3390/rs14184516
by Hai Huan 1,*, Yu Sheng 2, Yi Zhang 3 and Yuan Liu 2
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2022, 14(18), 4516; https://doi.org/10.3390/rs14184516
Submission received: 11 August 2022 / Revised: 7 September 2022 / Accepted: 8 September 2022 / Published: 9 September 2022

Round 1

Reviewer 1 Report

The following concerns need to be clearly explained by the author:

(1) Line 8, page 3, ‘dataset is DeepGlobe road extraction remote sensing map dataset, which has 2242 remote sensing images, each image size is 1500×1500 pixels, randomly selected according to the ratio of 4:1’. As we know, the DeepGlobe road extraction dataset is inconsistent with the author’s description. Please check the related information and add the citation. The URL of DeepGlobe is https://competitions.codalab.org/competitions/18467.

(2) Please add more detail in Figure 4 to clearly describe the method in this paper, especially for the encoder block of ResNet-50-C. Does it mean that only 2 of the 4 scales of features are utilized to extract roads? If yes, please explain why considered selecting only these features. In my view, the low-level features contain more detailed spatial locations which may be lost during the down-sampling and could not be recovered during the up-sampling.

(3) Line 4, page 5, ‘roads are often presented as more regular horizontal or vertical lines on an image, the network should pay more attention to the pixel distribution of rows and columns’. Indeed, the roads may be designed to be straight in urban, but what’s the mean that they are presented as more horizontal or vertical? Especially for the road in the rural area. The author should provide more analysis that why the 'strip attention module' works.

(4) It suggests shortening the content of the experiments which compare the parameter r in Section 3.1.1 since the original author has already discussed its impact well.

(5) As shown in Table 5, the IoU accuracy of the road is 63.05% on the DeepGlobe dataset. As I know, ‘Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images’ achieved higher results on this dataset about 66.3%, despite the data partition is not the same. Whether this is because the higher resolution features were omitted? The author needs to investigate more about the related research.

(6) Clearly check the manuscript, Figure 9 contains Chinese characters.

(7) The original image seems not consistent with the extracted results in Figure 13.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Good paper. Some minor mistakes:

1.
Eq 3 and more
/cross operator should be replaced by /cdot in formulas and text
For size of vector it is acceptable.

2.
p.8 l.25
p.9 l.8
r={2,4,8,16}

3.
Table 2, 3, 4, 5, 6, 7, 8, 9
as well as p.13 l.11 (there is more)
fixed digit after dot required
98.1 <- 98.10

4. language mistake
p.13 l.18 dot and coma

5.Section about "further works" is recommended

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors propose a sub-network for the extraction of road features in the row/column direction of the images and integrate it into a backbone (Resnet family model).
The novelty is represented by the strip attention module which split the information from a specific layer output of the Resnet and fed into the SAM module.
Mainly the workflow uses an autoencoder technique style and, the authors introduce SAM and CAF module to improve the generalization ability in road extraction.
The scientific contribution is enough clear and a full comparison with many deep models has been done.
It is not mandatory to publish your code, however, it will be a great help to the community research in doing so.

Despite this work seems clear and well written I have some doubts (as suggestions):

1) Why did you apply using only the features of the Res-2? Did you try to use the fine features extracted in Res-3?
Suggestion: You can use all the features extracted by all layers (Res-1, Res-N) to enforce the input feed into the sam module (because you have coarse to fine details).
2) Channel attention fusion module:
The skip connection between this module and the SAM module makes sense. However, I do not think that the two branches of the subnetwork with max and average pooling can improve your entire process.
Did you try to remove adaptive average pooling and the whole branch in some of your experiments?


The evaluation of this work is enough positive, a minor revision is required to improve the manuscript.

Minor revisions:

Introduction:

Line 26: "Recently, road extraction from remote sensing images have become more common;" Be more specific please.
Line 45: introduce a space between the connections word and [14] reference.
Lines 47-50: The main contributions might be written in an enumerated list.

Materials and methods section:

Fig.1: Center figure a and b.
Fig.2, same thing.
Line 12: Please, link to Figure 2.
Line 16: Divided into 3608 instances for the training set and 903 for the test set and...
Fig.3: Please center the figure a and b

In subsection 2.2:
Line 3-8: Specify the optimizer used in your experiments

Subsection 2.4:
I do not see any link in Figure 4. Please use \ref{}

Figure.9: I would like to know the Chinese.. Please rewrite English the image label.

Center Figure 10 and 11.
Figure 12 is out of the border. Please resize and center it.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

(1)Definitions for the IoU and the MIoU are inconsistent with the standard metrics as the intersection over union.

(2)The correct representation of formulas (4) and (6).

(3)There are two figures named 16.

(4)There are only shows results of four related methods in figure 15 and figure 16, while the table shows eight methods. Its better to compare the results of all the compared methods.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Back to TopTop