Next Article in Journal
coastTrain: A Global Reference Library for Coastal Ecosystems
Previous Article in Journal
Monitoring and Integrating the Changes in Vegetated Areas with the Rate of Groundwater Use in Arid Regions
 
 
Article
Peer-Review Record

Hierarchical Object-Focused and Grid-Based Deep Unsupervised Segmentation Method for High-Resolution Remote Sensing Images

Remote Sens. 2022, 14(22), 5768; https://doi.org/10.3390/rs14225768
by Xin Pan 1,2,3, Jun Xu 3, Jian Zhao 1 and Xiaofeng Li 2,*
Reviewer 1:
Reviewer 3:
Remote Sens. 2022, 14(22), 5768; https://doi.org/10.3390/rs14225768
Submission received: 23 September 2022 / Revised: 6 November 2022 / Accepted: 10 November 2022 / Published: 15 November 2022

Round 1

Reviewer 1 Report

The authors proposed a traditional with deep learning approach for semantic segmentation of high-resolution remote sensing images. The idea of controlling the learning and output is interesting and the result looks promising. Overall, the quality of the manuscript is good. However, I have some questions/suggestions for the authors. 

 

1.  Evaluation

The authors proposed a “semantic segmentation” work but why did you only make a comparison with “segmentation” works instead of “semantic segmentation” works with corresponding metrics, e.g., mIoU? After all, you proposed a “semantic segmentation” work, and the target is to generate a sematic map. It doesn’t seem right if you don’t evaluate your final result. A “semantic segmentation” work can associate different segments together semantically, therefore, I would consider it is a higher-level work compared to simply segment the input. That might be the reason the UDNN result is worse than other shallow methods in the table 1 because their targets are different. In your case, the maximum number of segments is bounded by R_grid . You then applied a deep neural network to have additional semantic level classification. Therefore, it doesn’t seem a fair comparison with other segmentation only works. Can you justify your evaluation method? 

 

2. Method

For the unsupervised models at different stages, I wonder how you trained them. Did you train the models with the same data but just different iterations (100, 50, 20)? If so, would it be a concern since the input at later stages is only sub-images (r_i)? 

 

3. Figures

In the figure 1, there are many long arrows across the figure or across each other. It makes the figure messy and hard to read. Using boxes with different colors might be a way to make it clean. In the figure 2, the bold blue arrow between R^i and R^(i+1) is redundant because you already indicated the direction by black arrows between images. There are similar situations in other figures. You tried to provide many details in the figures which is appreciated, but it sometimes makes figures hard to read. I would suggest the authors reexamine the figures and reorganize them to keep necessary information instead of all information.   

 

Author Response

We thank the editor and reviewer for their constructive suggestions, which have greatly helped improve the quality of the paper. Modified contents of the paper are marked with yellow, and we have responded to all the comments below.

 

Comment 1:

  1. Evaluation

The authors proposed a “semantic segmentation” work but why did you only make a comparison with “segmentation” works instead of “semantic segmentation” works with corresponding metrics, e.g., mIoU? After all, you proposed a “semantic segmentation” work, and the target is to generate a sematic map. It doesn’t seem right if you don’t evaluate your final result. A “semantic segmentation” work can associate different segments together semantically, therefore, I would consider it is a higher-level work compared to simply segment the input. That might be the reason the UDNN result is worse than other shallow methods in the table 1 because their targets are different. In your case, the maximum number of segments is bounded by R_grid . You then applied a deep neural network to have additional semantic level classification. Therefore, it doesn’t seem a fair comparison with other segmentation only works. Can you justify your evaluation method?

Response:

(1) Our method is an unsupervised segmentation method

       Our method is an unsupervised segmentation method, not supervised "semantic segmentation", with significant differences in its processing goals and results; this kind of method also called "superpixel clustering" in papers on traditional shallow-model methods. We characterize our method as "unsupervised segmentation" for two main reasons:

1) In the field of deep learning, there are many papers on "image clustering", which refers to the unsupervised categorization of entire image blocks (e.g., scene classification), while our output is at the pixel level.

2) Our paper is consistent with currently popular development kits such as OpenCV and Scikit-image; the documentation of both these development kits uses the term "segmentation" to refer to the functions of "superpixel clustering", and we consistently use the term "segmentation" to help users of these development kits quickly understand the functions that HOFG can provide and the role it can play.

Since there are no semantics-based training data, the task of unsupervised or clustering methods is not to identify semantic segments but to separate all objects at the smallest cost (lowest number of segments). The reason why the experimental results look like "semantic segmentation" is that in Section 5.1, a comparison method based on the ground-truth images is introduced, and the output results are colored to more easily show whether a method violates the boundaries of the ground-truth objects or separates those objects correctly.

 (2) Regarding metrics, e.g., mIoU

The mIoU has been added as a new metric in Section 5.1 and in all the accuracy comparison tables.

From the results, we can see that since the unsupervised segmentation method considers all classes in the whole image in the comparison process, the results and trends of the mIoU are basically similar to those of the OA. At the same time, since the mIoU is the average of the IoUs for each category, some categories with a small number of pixels and lower accuracy will exert higher influence, causing the mIoU to be slightly lower than the OA.

(3) Regarding fair comparison with a UDNN

       As discussed in Section 5.1, any segmentation method can achieve close to 100% OA as long as the number of segments is very large and their size is sufficiently small. The figure below illustrates how smaller segments can directly improve the OA.

 

(Response Figure 1, please see attached-file)

 

So simply improving OA is not a direct goal for all unsupervised or clustering methods. Achieving higher OA with fewer segments is the base to prove the superiority of a method.

When performing pixel-level segmentation without training samples, if we were to simply adopt the strategies of "processing one segment with one category label" and "majority voting", the following typical problem would likely be encountered: if the segments are very fine/small, the corresponding OA will be very high. As an example, for the input image in Figure 1 of this paper, when the UDNN iterates only once, we obtain the following results:

 

(Response Figure 2, please see attached-file)

 

Because the segments at the boundary are sufficiently small and many in number, the OA of this segmentation is close to 100%. However, this excessively fragmented segmentation obviously loses any meaning for the application of remote sensing image segmentation. Therefore, we need to compare all the methods under certain constraints.

       Regarding the statement that "the maximum number of segments is bounded by R_grid", R_grid is used to limit HOFG but not the UDNN; the UDNN can generate any number of output segments, but as discussed in the previous paragraph, unboundedly small segments are not good results. Thus, we attempt to present the comparison in as fair an environment as possible, and we also attempt to draw conclusions based on application value.

 

Comment 2:

  1. Method

For the unsupervised models at different stages, I wonder how you trained them. Did you train the models with the same data but just different iterations (100, 50, 20)? If so, would it be a concern since the input at later stages is only sub-images (r_i)?

Response:

Our approach is an unsupervised/clustering method, and like traditional unsupervised/clustering methods (e.g., k-means), HOFG needs no training set and has no training process; HOFG merely obtains segments through the UDNN's iterations.

 

Comment 3:

  1. Figures

In the figure 1, there are many long arrows across the figure or across each other. It makes the figure messy and hard to read. Using boxes with different colors might be a way to make it clean. In the figure 2, the bold blue arrow between R^i and R^(i+1) is redundant because you already indicated the direction by black arrows between images. There are similar situations in other figures. You tried to provide many details in the figures which is appreciated, but it sometimes makes figures hard to read. I would suggest the authors reexamine the figures and reorganize them to keep necessary information instead of all information.

Response:

       We have carefully checked all the figures and improved unclear and redundant markers, especially the arrow markers. All figures are now more concise than in the previous version.

       Especially Figure 1, we have completely reconstructed this figure.

 

(Response Figure 3, please see attached-file)

 

Author Response File: Author Response.docx

Reviewer 2 Report

Dear authors,

I have finished the review of your proposal, the proposed method seems to achieve betters results than powerful segmentation methods. I have the following recommendations. 

1. Check the specs of the employed hardware (seems that gpu is wrong).

2. Provide a table summarizing the employed dataset (source, number of images and classes or groups), and justify the choice.

3. Whether is possible, please provide execution time per experiment to assess its efficency. 

4. Provide one or more tables which summarizes the chosen parameters of your model and the used Unsupervised Deep Neural Networks.

I hope you find these recommendations useful. 

Author Response

We thank the editor and reviewer for their constructive suggestions, which have greatly helped improve the quality of the paper. Modified contents of the paper are marked with yellow, and we have responded to all the comments below.

 

 

Comment 1:

Check the specs of the employed hardware (seems that gpu is wrong).

Response:

The video memory description was indeed incorrect; we have changed "11 MB" to "11 GB".

 

Comment 2:

Provide a table summarizing the employed dataset (source, number of images and classes or groups), and justify the choice.

Response:

       We have added "Table 1. Summary of the experimental image dataset" to the paper.

 

Comment 3:

Whether is possible, please provide execution time per experiment to assess its efficency.

Response:

       We have added a new section, "5.5 Execution time comparison", to describe and discuss the execution times of all methods.

 

Comment 4:

Provide one or more tables which summarizes the chosen parameters of your model and the used Unsupervised Deep Neural Networks.

Response:     

       As discussed in Section 5.1, any segmentation methods can achieve close to 100% OA as long as the number of segments is very large and their size is sufficiently small. In this paper, the SLIC method is adjusted based on the input image, and the other methods are adjusted with reference to SLIC; thus, the parameters of all methods in this paper are dynamically changing. At the end of Section 5.1, we have added "Table 2. Parameter settings for all methods" to describe this process.

 

 

Reviewer 3 Report

1. The specific accuracy of hofg improvement is not shown in the abstract.

 

2. What are the advantages of this unsupervised method over other supervised methods

 

3. What are the advantages of this method compared with other mainstream models, such as detection accuracy.

 

4. Is the precision in the abstract recognition precision, or is it some other precision?

 

5. How about the parameter size and recognition speed of this method

 

6. The abstract section lacks a more detailed description of the detection method proposed in this paper and a brief description of the work done in this paper is recommended in the abstract.

 

7. The practical application value of this study is duly added.

 

8. Refine the language of the article.

Author Response

We thank the editor and reviewer for their constructive suggestions, which have greatly helped improve the quality of the paper. Modified contents of the paper are marked with yellow, and we have responded to all the comments below.

 

Comment 1:

The specific accuracy of hofg improvement is not shown in the abstract.

Response:

Related content has been added to the abstract.

 

Comment 2:

  1. What are the advantages of this unsupervised method over other supervised methods?

Response:

Our method is an unsupervised segmentation/clustering method, not supervised "semantic segmentation". Since there are no training data, the task of unsupervised or clustering methods is not to identify semantic segments but to separate all objects at the smallest cost (lowest number of segments).

Our unsupervised method does not belong to the same field of application as supervised methods, whose goal is to identify all pixel categories with high accuracy based on training data.

Our method attempts to separate objects in remote sensing images without training samples or any prior knowledge at all. This approach has value in two respects: 1) Like other unsupervised methods, its results can serve as the basic units for object-based image analysis (OBIA). OBIA is an important topic in the field of remote sensing, and high-quality basic units are important for improving OBIA performance. 2) The ability to separate an image into superpixels directly, without prior knowledge or a training set, can significantly speed up processing and reduce the workload when performing tasks involving manual interpretation.

 

Comment 3:

What are the advantages of this method compared with other mainstream models, such as detection accuracy.

Response:

       Our method is an unsupervised segmentation/clustering method, not a supervised "object detection" method.

To the best of our knowledge, our method is the first open-source deep unsupervised segmentation method for high-resolution remote sensing images. Our method works in practice and can achieve acceptable segmentation results. Existing deep unsupervised segmentation methods are not applicable in the field of remote sensing.

For further evaluation of the "detection accuracy", the mIoU has been added as a new metric in Section 5.1 and in all accuracy comparison tables.

 

Comment 4:

Is the precision in the abstract recognition precision, or is it some other precision?

Response:

       As discussed in Section 5.1, any segmentation method can achieve close to 100% OA as long as the number of segments is very large and their size is sufficiently small. The figure below illustrates how smaller segments can directly improve the OA.

 

(Response Figure 1, please see attached-file)

 

 

So simply improving OA is not a direct goal for all unsupervised or clustering methods. Achieving higher OA with fewer segments is the base to prove the superiority of a method.

When performing pixel-level segmentation without training samples, if we were to simply adopt the strategies of "processing one segment with one category label" and "majority voting", the following typical problem would likely be encountered: if the segments are very fine/small, the corresponding OA will be very high. As an example, for the input image in Figure 1 of this paper, when the UDNN iterates only once, we obtain the following results:

 

(Response Figure 2, please see attached-file)

 

Because the segments at the boundary are sufficiently small and many in number, the OA of this segmentation is close to 100%. However, this excessively fragmented segmentation obviously loses any meaning for the application of remote sensing image segmentation. Therefore, we need to compare all the methods under certain constraints.

       Regarding the statement that "the maximum number of segments is bounded by R_grid", R_grid is used to limit HOFG but not the UDNN; the UDNN can generate any number of output segments, but as discussed in the previous paragraph, unboundedly small segments are not good results. Thus, we attempt to present the comparison in as fair an environment as possible, and we also attempt to draw conclusions based on application value.

 

 

Comment 5: 

  1. How about the parameter size and recognition speed of this method?

Response:

We have added Table 2 to describe the parameter settings of the methods and have added Section 5.5 to discuss the execution times of the methods.

 

Comment 6: 

The abstract section lacks a more detailed description of the detection method proposed in this paper and a brief description of the work done in this paper is recommended in the abstract.

Response:

       Our method is an unsupervised segmentation/clustering method, not a method of object detection, as stated in our reply to comment 3; the main application area of our method is OBIA.

 

Comment 7:

The practical application value of this study is duly added.

Response:

More experiments will be needed to truly prove the "application value" of our method, and since this paper focuses on the details of the method, there is no extra space to test its application value; therefore, the mentions of "application value" have been removed from the abstract and conclusion.

 

Comment 8: 

Refine the language of the article.

Response:

We have engaged a language polishing service, AJE, to check the contents of the paper. We expect the language quality of this revised paper to meet the reviewer's requirements.

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have addressed the comments in the previous review. I recommend to accept the manuscript for publication. 

Author Response

Comment 1The authors have addressed the comments in the previous review. I recommend to accept the manuscript for publication.

 

Response:

We appreciate the editor and reviewer for their suggestions, which have greatly helped improve the quality of the paper. We also thank them for their encouragements, which bolster our confidence to continue working on remote sensing research.

Reviewer 3 Report

 

I would be very glad to re-review the paper in greater depth once it has been edited because the subject is interesting. The author adds some experiments and explanations to make the sample manuscript richer, more logically coherent, and more convincing, proving that the method does have excellent performance and certain innovation, but there are still some minor problems that need to be improved by the author.

1. In section5.5, author must avoid to write ambiguous statement such as “in future research, we will analyze the results”,much work is still ahead, may indicate properly.Please write a few sentences about the planned further research and leave the conclusions and future directions in a unique section.

2. It is suggested that the author should also supplement the improvement and prospect of HOFG execution speed in the conclusion.

3. In lines 158 and 165, I suggest using a different serial number from line 136. For example, I suggest replacing (1) and (2) with (i) and (ii). Alternatively, I suggest changing the serial numbers in lines 107 and 136 to 2.1 and 2.2.

4. In line 222, a space missing after the 1.

5. On the line 331,332,445,446,696,697 the picture size in the article is inconsistent. The position of the subscript name of the picture in the article which may make the reader think it is a subtitle.

6. In line 723, a space missing after the 6.

7. I would suggest checking that there is not a space missing between Figure X and (X).

 

Comments for author File: Comments.pdf

Author Response

We appreciate the editor and reviewer for their suggestions, which have greatly helped improve the quality of the paper. We also thank them for their encouragements, which bolster our confidence to continue working on remote sensing research.

 

Comment 1:

  1. In section5.5, author must avoid to write ambiguous statement such as “in future research, we will analyze the results”,much work is still ahead, may indicate properly.Please write a few sentences about the planned further research and leave the conclusions and future directions in a unique section.

Response:

       We deleted original content " in future research,…." and added the following content to section 5.5:

From the experimental results, we note that the UCM-Train algorithm of HOFG needs to run completely from the initial state in each iteration process, which will consume considerable execution time of HOFG. In future work, we will start to improve UCM-TRAIN so that it can continue the experience of the previous iteration of HOFG, which will greatly improve the training speed of HOFG.

 

Comment 2:

  1. It is suggested that the author should also supplement the improvement and prospect of HOFG execution speed in the conclusion.

Response:

       We added the following content to the Conclusion section:

HOFG also has shortcomings: HOFG's speed is slower than the traditional shallow model methods. The key factor affecting the execution speed lies in the UCM-TRAIN algorithm. By improving the training strategy of UCM-TRAIN, the execution time of HOFG can be significantly reduced. We will focus our research on this issue to make HOFG useful in the field that needs to obtain results quickly or repeatedly.

 

Comment 3:

  1. In lines 158 and 165, I suggest using a different serial number from line 136. For example, I suggest replacing (1) and (2) with (i) and (ii). Alternatively, I suggest changing the serial numbers in lines 107 and 136 to 2.1 and 2.2.

Response:

Corresponding serial numbers has been corrected.

 

Comment 4:

  1. In line 222, a space missing after the 1.

Response:

A space has been added at corresponding position.

 

Comment 5:

  1. On the line 331,332,445,446,696,697 the picture size in the article is inconsistent. The position of the subscript name of the picture in the article which may make the reader think it is a subtitle.

Response:

       All the figures were resized and repositioned, and now they become consistent.

 

Comment 6:

  1. In line 723, a space missing after the 6.

Response:

A space has been added at corresponding position.

 

Comment 7:

  1. I would suggest checking that there is not a space missing between Figure X and (X).

Response:

All relevant content has been corrected.

 

 

 

Back to TopTop