Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Local–Global Framework for Semantic Segmentation of Multisource Remote Sensing Images

Remote Sens. 2023, 15(1), 231; https://doi.org/10.3390/rs15010231

by Luyi Qiu¹

, Dayu Yu^2,*

, Chenxiao Zhang² and Xiaofeng Zhang³

Reviewer 1:

Cheng Liao

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2023, 15(1), 231; https://doi.org/10.3390/rs15010231

Submission received: 17 November 2022 / Revised: 27 December 2022 / Accepted: 27 December 2022 / Published: 31 December 2022

(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

Line461, MLGNet or LGNet?

Line 404 Table 5 or Table 6?

Table 7 PRM or FLOPs? It’s inconsistent in the experiment section.

Author Response

Thank you for your nice comments on our article. According to your suggestions, we have supplemented several data here and corrected several mistakes in our previous draft. Based on your comments.we also attached a point-by-point letter to you. We have made extensive revisions to our previous draft. The detailed point-by-point responses are listed below.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors of the manuscript propose a framework for semantics segmentation using multi-source remote sensing images. The framework takes both local and global attention of the feature into account, combines multiple mechanism such as fusion module, transformer modules, etc., to form the whole structure. The manuscript is completed with high quality and adequate experiments.

However, there are some aspects that need minor adjustments or clarifications:

1. Page 5, Line 151, “EffecientNet b2” should be "EffecientNet B2" to keep consistent with other places where this word appears.

2. In Page 7, the paragraph above equation (9), and the paragraph above equation (15), variables appearing in the equation should also be italicized.

3. The constant b in equation (9) should be mentioned in the context. Same for equations (10)~(12), (14).

4. In equation (17), what is LossCT?

5. A curious question, comparing to Table 5, the statics in Table 6 shows that the proposed method has higher rankings on Low-veg, Tree and Car than that shown in Table 5. Is there a reasonable explanation?

6. It would be better to zoom some of the result images, as now it's difficult to distinguish the small objects like cars from the image. And why the DSM images are not given in these figures?

7. Please check the reference carefully, some are not in the right format.

Author Response

Thanks very much for your comment, which is highly appreciated. We feel great thanks for your professional review work on our article. As you are concerned, there are several problems that need to be addressed. According to your nice suggestions, we have made extensive corrections to our previous draft. We have added necessary data to supplement our results and edited our article extensively. The detailed corrections are listed below

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper proposed a novel Local-global framework (LGNet) for Multi-source RS image semantic segmentation. The LGNet uses the dual-source fusion network to extract multiple-level features from the IRRG and DSM images and selectively fuse features of different levels. Moreover, an LG-Trans module is designed to help the framework to capture fine-grained local features and coarse-grained global features and improve the recognition ability of multi-scale objects. Furthermore, the LGNet introduces the pixel-wise contrastive loss to help the framework approximate the positive ground truth and move away from the wrong negative prediction. The proposed Local-global Framework achieves state-of-the-art performance with 90.45% mean f1 score on the ISPRS Vaihingen dataset and 93.20% mean f1 score on the ISPRS Potsdam dataset.

Besides writing and grammar, I have the following revision suggestions for authors to consider:

1. The title “Semantics Segmentation” should be “Semantic Segmentation”.

2. What is NDSM? Please write the full name when you first mention it.

3. Aerial image is perspective projection and DSM is orthographic projection, and these two types of images cannot be aligned pixel by pixel. DSM help reduce misclassification due to shadow, does it increase misclassification of buildings or impervious surfaces? Please analyze.

4. L168, “we propose an LG-Trans module consisting of submodule A and submodule B”, but it is labeled “Part A” and "Part B” in Figure 4, so please be consistent.

5. The caption in Figure 4, "LG-trans” should be “LG-Trans”.

6. L182，what is TGF？

7. Equation (13) the third line below, "Furthermore, MLF submodules use batch normalization operation and relu activation function after each branch’s convolution or pooling operation to reduce the displacement of internal co-variables of features”. Where is MLF submodules, please illustrate.

8. The caption in Figure 5（d), "Potsdam IRRGB Image” should be “IRRG Image”, or “RGB Image”. In addition, please unify the “image” and “Image” in the subfigure name。

9. Please explain the source of Ground truth? From Image manual labeling or field survey?

10. Suggest changing the color scheme of ground truth in the paper. Blue is usually used to represent waters, and it is not appropriate to represent buildings. It is suggested that trees and low vegetation are represented by different green colors.

11. L246, "The original DeepLab v3+ is selected as the baseline”. Why not choose Efficient-b2 as the baseline?

12. L295, "We apply submodules A based on LGNet-G” should be “We apply submodule B based on LGNet-D”.

13. L299, “We apply trans-based global fusion submodules based on LGNet-D”, suggest changing to “We apply submodule A based on LGNet-D”.

14. L325, “The LGNet performs better than the LGNet-L and LGNet-G" should be “The LGNet-LG performs better than the LGNet-L and LGNet-G”

15. L332, “We apply the contrastive loss (CT) based on LGNet-LG”, suggest changing to “We add the contrastive loss (CT) based on LGNet-LG”

16. L461, “MLGNet” should be “LGNet”.

Author Response

We feel great thanks for your professional review work on our article. As you are concerned, there are several problems that need to be addressed. According to your nice suggestions, we have made extensive corrections to our previous draft. We have added necessary data to supplement our results and edited our article extensively. The detailed corrections are listed below.

Author Response File: Author Response.pdf

Article Menu

A Local–Global Framework for Semantic Segmentation of Multisource Remote Sensing Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI