Review Reports - A Super-Resolution Network for High-Resolution Reconstruction of Landslide Main Bodies in Remote Sensing Imagery Using Coordinated Attention Mechanisms and Deep Residual Blocks

Round 1

Reviewer 1 Report

This paper proposes a super-resolution model of a landslide image for the main body of the landslide. this model improves the efficiency of the model by strengthening the feature extraction module and is compared with some other Super-resolution methods. The structure of the paper is well-organized and the methods are reasonable. From the detailed analysis, I think the manuscript can be accepted after some revisions described below:

1、 What does the question mark in Figure 5 mean? The manuscript did not explain it.

2、 Fig5 suggests that the model diagram of the generating network and the discriminant network be divided into two charts.

3、 It is suggested that some parameters such as dimension and size should be expressed in the network model diagram.

4、 In the visualization results of the Fig10 generator, it is recommended to visualize more, group the training and verification sets, and analyze the visualization results.

5、 272 line format needs to be adjusted

6、 There are three parameters and four interpretations in the explanation of formula 1, and there are problems with the format of the line.

There are still many grammars and spelling errors in this paper, I strongly recommend the authors revise their manuscript by an English native speaker.

Author Response

Reviewer #1：

Thank you very much for your and the reviewers’ comments and suggestions. We have carefully reviewed the manuscript, and the revised parts are in red highlight. Each suggested revision and comment was accurately incorporated and considered. We provide below a point-to-point list concerning the changes made in the text for your convenience to check.

1、What does the question mark in Figure 5 mean? The manuscript did not explain it.

Response： GAN is composed of two neural networks. The first is called Discriminator, and the other network is called Generator. Discriminator only plays a role in judging whether it is true or not, so the question mark of the Discriminator network in Figure 5 stands for judging whether it is true or not.

2、Fig5 suggests that the model diagram of the generating network and the discriminant network be divided into two charts.

Response： The Fig5（now Fig3 and Fig4） in the manuscript has been modified according to your requirements, and the two neural networks are divided into two diagrams, The modified Fig5 is presented in the revised manuscript.

3、It is suggested that some parameters such as dimension and size should be expressed in the network model diagram.

Response： The network structure diagram in the manuscript has added more parameters according to your requirements, and the modified network structure diagram is presented in the revised manuscript.

4、In the visualization results of the Fig10 generator, it is recommended to visualize more, group the training and verification sets, and analyze the visualization results.

Response： The Fig11 (former Fig10) in the manuscript has been modified according to your requirements, and the training set and test set are grouped in detail for visual analysis and redrawing.

Figure 11 The generator visualization results of the test set before and after improvement. (a) The EDCA-SRGAN generator outputs the results. (B) The generator of the SRGAN outputs the result. Blue is used to indicate low response values, green is used to indicate medium response values, and yellow is used to indicate higher response values. In addition, the use of red in visualization shows very high response values.

Figure 12 The generator visualization results of the Training set before and after improvement. (a) The EDCA-SRGAN generator outputs the results. (B) The generator of the SRGAN outputs the result. Blue is used to indicate low response values, green is used to indicate medium response values, and yellow is used to indicate higher response values. In addition, the use of red in visualization shows very high response values.

5、272 line format needs to be adjusted

Response： The original line 272 (now 278) has been modified because of a problem with the format due to the formula.

6、There are three parameters and four interpretations in the explanation of formula 1, and there are problems with the format of the line.

Response： Sorry, due to personal negligence, there is an omission in the interpretation of this formula. We have added the explanation of the formula to the paper.

Author Response File: Author Response.docx

Reviewer 2 Report

It is an interesting research and article that can be publish after some revisions

We believe that the way to cite the references in the body of the text does not follow the template and the indications of the remote sensing journal that follows the mode [ref]

line 94 it is stated "In comparison to the works of other researchers," please add those references

Fig. 1 Please add a reference scale

line 224: the title is out of format

Fig. 8, 10 and 11 add reference scale

Author Response

Reviewer #2：

Thank you very much for your and the reviewers’ comments and suggestions. We have carefully reviewed the manuscript, and the revised parts are in red highlight. Each suggested revision and comment was accurately incorporated and considered. We provide below a point-to-point list with reference to the changes made in the text for your convenience.

It is an interesting research article that can be published after some revisions.

1、We believe that the way to cite the references in the body of the text does not follow the template and the indications of the remote sensing journal that follows the mode [ref]

Response： I am sorry that due to my personal negligence, I did not carefully check the insertion format of the references in the remote sensing journal. The format of references has been carefully revised in accordance with the requirements of the journal.

2、 Line 94 it is states "In comparison to the works of other researchers," Please add those references

Response： Thank you for pointing out this problem in our manuscript. According to the revised content, we have added three articles related to this article here. They are:

[31]. Liu, B.; Zhao, L.; Li, J.; Zhao, H.; Liu, W.; Li, Y.; Wang, Y.; Chen, H.; Cao, W. Saliency-Guided Remote Sensing Image Super-Resolution. Remote Sens. 2021, 13, 5144, doi:10.3390/rs13245144.

[32]. Ma, J.; Yu, J.; Liu, S.; Chen, L.; Li, X.; Feng, J.; Chen, Z.; Zeng, S.; Liu, X.; Cheng, S. PathSRGAN: Multi-Supervised Super-Resolution for Cytopathological Images Using Generative Adversarial Network. IEEE Trans. Med. Imaging 2020, 39, 2920–2930, doi:10.1109/TMI.2020.2980839.

[33]. Lei, J.; Xue, H.; Yang, S.; Shi, W.; Zhang, S.; Wu, Y. HFF-SRGAN: Super-Resolution Generative Adversarial Network Based on High-Frequency Feature Fusion. J. Electron. Imaging 2022, 31, 033011, doi:10.1117/1.JEI.31.3.033011.

3、Fig. 1 Please add a reference scale.

Response： Thank you for pointing out this problem in our manuscript. For the image add a reference scale you proposed, we find it difficult to add a reference scale after experiments and considerations, for the following reasons: 1. Our UAV data covers the entire Jinsha River. Based on this, the data set in this paper is selected from a large amount of UAV data, and their geographical coordinates are far from each other and are not coherent. 2. This paper is a methodological study, and there is no targeted research on a certain study area. 3. Because of the complex terrain in the field, the parameter settings are different in UAV flight, so it is difficult to add a reference scale to each data set alone. 4. Based on the parameters of the UAV and the flight parameters, After our rough calculation, the ground resolution is 0.02 m / pixel. we estimate that the scale is about 59.88 (each pixel represents the actual ground distance of 59.88 cm). The problem you pointed out also reminds me that in the following research, the discovery and study of the law of landslides in a certain area is very important.

4、line 224: The title is out of format.

Response： I am sorry that due to my personal negligence, the format check in the article is not enough, and the problem has been modified in the resubmitted manuscript.

5、Fig. 8, 10 and 11 add reference scale

Response： Thank you for pointing out this problem in our manuscript. For the image add the reference scale you proposed, We think it is difficult to add a reference scale through experiments and consulting literature. the reasons are: 1. After making the data set, the images we collected are unified to the size of 512 × 512, and the proportion changes to a certain extent. 2. The test set is also selected from the UAV data collected by us, and their positions are so different that it is difficult to add a reference scale to each image one by one.

Author Response File: Author Response.docx

Reviewer 3 Report

This paper presents super-resolution algorithm applied to remote sensing imaginary that represents landslide. The contribution of this paper is a superresolution deep network. In my opinion is much more image processing paper that remote sensing, because it develops method for super resolution and does not present remote sensing problems.

1. It is difficult to relay on resampled data, most of the experts do not use any reconstruction algorithms, especially at landslide application. If would be interesting to write within the paper the end user.

2. Section 3 should be reorganized to firstly present Network architecture and then detailed description.

3. EDCA algorithm or structure should be explained. If the motivation of to lower computational complexity, then authors should specify what is amount of enhancement.

4. Fig. 5 is not well commented within the text. It is difficult to follow the notation on Fig. 5. Problem is that Fig. 5a has 2 subfigures. It is not clear what represents the upper part of Fig. 5a. The same problem is with Fig. 5b.

5. What kind of images are you resampling? RGB, hyperspectral?

6. What is a ground resolution of high resolution images?

7. Description of training and validation procedures are missing.

8. Comparison with the mostly used methods is missing.

9. In section 5 you could include multiple views to enhance data resolution and features.

. Check captions of images and section titles.

Author Response

Reviewer #3：

Thank you very much for your and the reviewers’ comments and suggestions. We have carefully reviewed the manuscript, and the revised parts are in red highlight. Each suggested revision and comment was accurately incorporated and considered. We provide in below a point-to-point list with reference to the changes made in the text for your convenience to check.

This paper presents super-resolution algorithm applied to remote sensing imaginary that represents landslide. The contribution of this paper is a super resolution deep network. In my opinion is much more image processing paper that remote sensing, because it develops method for super resolution and does not present remote sensing problems.

1.It is difficult to relay on resampled data, most of the experts do not use any reconstruction algorithms, especially at landslide application. If would be interesting to write within the paper the end user.

Response：Thank you for pointing out this problem in our manuscript. In some applications, especially those that require accurate measurement or detailed analysis, experts may use the original, unresampled data as much as possible. This helps minimize any potential artefacts or errors introduced during resampling. We regret that it is difficult for us to describe the specific sampling process. The main reasons are as follows:

①the bicubic used in this paper, Ref. 24 has a specific explanation for this method because this paper only uses this sampling method, which is not the focus of this paper.In certain applications, especially those that require precise measurements or detailed analysis, experts may prefer to work with the original, unresampled data whenever possible. This helps minimize any potential artifacts or errors introduced during the resampling process.

②One of the limitations of remote sensing super-resolution research is the lack of corresponding high-resolution and low-resolution images. due to the support of the project, UAV data can be collected in the field, but the corresponding satellite remote sensing low-resolution images cannot be collected. and satellite remote sensing due to the revisit period and cloud cover, after processing, will affect the hue of the satellite image. Therefore, this paper refers to: Gong, Y.; Liao, P.; Zhang, X.; Zhang, L.; Chen, G.; Zhu, K.; Tan, X.; Lv, Z. Enlighten-GAN for Super Resolution Reconstruction in Mid-Resolution Remote Sensing Images. Remote Sens. 2021, 13, 1104, doi:10.3390/rs13061104. Etc. In related articles, there are high-resolution images, and then downsampling to get low-resolution images,

Naturally, different downsampling methods may introduce super-score effects in the resulting images. Therefore, we will further investigate the impact of these effects in our subsequent research.

2.Section 3 should be reorganized to firstly present Network architecture and then detailed description.

Response：Thank you for pointing out this problem in our manuscript. We modify the structure of Methods according to your requirements. The original structure: EDCA block > > Coordinate Attention > > Network architecture > > loss function > > Image quality evaluation index is modified to Network architecture > > EDCA block > > Coordinate Attention > > loss function > > Image quality evaluation index.

First, the model structure is introduced, and the parts of the model structure are introduced.

3.EDCA algorithm or structure should be explained. If the motivation of to lower computational complexity, then authors should specify what is amount of enhancement.

Response：Thank you for pointing out this problem in our manuscript. We are sorry that we did not explain the EDCA module clearly, and the explanation of this section is not easy to find, so we changed the Section 3.2 EDCA block to Section 3.2 EDCA structure so that we can read it clearly.

The design of the EDCA module is to improve the feature ability of the model while reducing the number of growing parameters to a certain extent. In theory, the more the number of parameters of the model, the stronger the expressive ability of the model it can better adapt to the training data. The EDCA residual module superimposes ten blocks in the model in this paper, so it will increase a lot of parameters. The EDCA residual block we designed uses depth separable convolution to achieve this purpose. The comparison of the number of parameters is shown in the last column of the table below.

Method	PSNR（dB）	SSIM	Block number	Params number
SRGAN	25.46945	0.67910	10	1,103,377
BAM-SRGAN	25.11715	0.67298	10	1,118.177
EDCA-SRGAN	26.25111	0.68343	10	1,156,497

If the increase in the number of parameters is not considered, the number of parameters of the whole model will reach 1,476,369, which is much higher than that of our model.

4.Fig. 5 is not well commented within the text. It is difficult to follow the notation on Fig. 5. Problem is that Fig. 5a has 2 subfigures. It is not clear what represents the upper part of Fig. 5a. The same problem is with Fig. 5b.

Response：Thank you for pointing out this problem in our manuscript. We apologize for the unclear interpretation of the subgraph. We have modified the picture in the manuscript so that we can have a clear understanding of the structure diagram.

5. What kind of images are you resampling? RGB, hyperspectral?

Response： Thank you for pointing out this problem in our manuscript. I am sorry that to explain the data of this paper clearly, the object of resampling in this paper is the UAV image. due to equipment limitations, the UAV images we collect are all RGB. We added an explanation in the Section 2 Materials section.

6. What is the ground resolution of high-resolution images?

Response： Thank you for pointing out this problem in our manuscript. Due to the complex topography in the field, the flying altitude is different in the process of UAV acquisition, and the ground resolution of UAV images can be changed with the change of flying altitude. There is a certain relationship between the ground resolution the pixel resolution and the flying altitude of the UAV camera. Generally speaking, the ground resolution can be calculated by the following formula: ground resolution = (flight altitude × object resolution) / focal length. Among them, the flight height refers to the vertical distance of the UAV relative to the ground, the object resolution is the pixel resolution of the UAV camera, and the focal length is the focal length of the camera.

It can be seen from the formula that as the flight altitude increases, the ground resolution decreases, which means that there are fewer ground details that can be distinguished. On the contrary, when the flight altitude decreases, the ground resolution will increase, and the ground details that can be distinguished will be more abundant.

Therefore, the high-resolution image in this paper is difficult to specify the specific ground resolution. After our rough calculation, the ground resolution is 0.02 m / pixel. Secondly, about the research direction of this paper, the resolution of this set of data is far higher than that of other data sources, so it is used as a high-resolution data set in this paper.

7. Description of training and validation procedures are missing.

Response： Thank you for bringing up this issue in our manuscript. We sincerely apologize for the absence of specific experimental and testing details in the "implementation details" section. To rectify this, we have revised and extended the section to include the necessary information. Please find below the modified version of the "implementation details" section:

Details of Implementation:

Dataset: The dataset used in this study consists of high-resolution images (I^HR) selected from UAV images. These images were cropped to 512 × 512 pixels, focusing on landslide objects with rich texture and detailed information. To augment the dataset, these images were further processed, including rotation (0°, 90°, 180°, and 270°) and mirroring, resulting in eight-fold samples. In total, there were 1271 I^HR images.
Data Split: Out of the 1271 I^HR images, we divided the dataset into two parts: 1000 images for training and 271 images for testing. To create the training set, we randomly selected 1000 images, ensuring a diverse representation of landslide objects. The remaining 271 images were reserved exclusively for testing.
Training Process: a. Model Architecture: We employed the SRGAN (Ledig et al., 2017) model, enhanced with the proposed EDCA-SRGAN architecture and BAM (Park et al., 2018) attention mechanism. b. Input Preparation: For training, we used bicubic up-sampling to generate low-resolution images (I^LR) from the I^HR images. These I^LR images, along with their corresponding I^HR counterparts, were used as input pairs for the network. c. VGG-19 Parameters: Pre-trained parameters of the VGG-19 model were imported to assist in feature extraction during the training process. d. Learning Rate and Optimization: The initial learning rate was set to 2e-4. During training, the learning rate followed a cosine decay schedule. Adam optimizer was utilized for optimizing the network parameters. e. Training Epochs: The training process consisted of 200 epochs.
Testing Process: a. Input: The testing phase involved directly feeding 128 × 128-pixel images into the network. b. Super-resolution Results: The network processed the input images and generated super-resolution images as the output.
Hardware Setup: The experiments were conducted using an NVIDIA GeForce GTX 3090 24GB GPU and 64GB of RAM.

We hope these modifications provide the necessary details regarding the implementation, experiment, and testing procedures. Please let us know if you have any further suggestions or requirements.

In the manuscript, the above contents are collated and revised as follows:

The dataset was selected from UAV images and cropped to 512 × 512 pixels high-resolution images. Our selection criteria focused on landslide objects with rich texture and detailed information. To increase the dataset, the cropped images were rotated and mirrored at four angles (0°, 90°, 180°, 270°), resulting in eight-fold samples. This process yielded a total of 1271 high-resolution reference images (IHR), each with a resolution of 512 × 512 pixels. Out of these images, 1000 were allocated for training and 271 for testing. Additionally, these images were downsampled by a factor of four to 128x128 pixels, forming both I_HR and I_LR versions. During training, LR images were fed into the SRGAN model using the proposed EDCA-SRGAN architecture, as well as bicubic upsampling, SRGAN [37], and BAM [38] attention mechanism. The parameters of the VGG-19 model were pre-imported, and a learning rate of 2e-4 was used. The learning rate followed a cosine decay schedule throughout the 200 epochs of uniform training. In the testing phase, the network directly processed the 128 × 128 images to obtain the super-resolution results.

The experimental hardware setup consisted of an NVIDIA GeForce GTX 3090 24GB GPU and 64GB of RAM.

8. Comparison with the most used methods is missing.

Response： Thank you very much for asking this question about our manuscript. The design of the comparative test in this paper is mainly as follows:

①This paper is improved based on the srgan model, and the comparison experiments of the same type of articles will be compared with srcnn. Srcnn proposed in 2015 that with the development of the super-score model, the current research is mainly focused on the Gan model and diffusion structure, so this paper abandons the comparison model of srcnn, and the comparison with srgan can better cover srcnn.

②One of the steps in the construction of the EDCA module in this paper is to add the CA attention mechanism, so the second control group is the comparison of the conventional BAM attention mechanism, and it is concluded that not all attention mechanisms can be smoothly integrated into the network and produce results. In the research object and goal of this paper, the CA attention mechanism selected in this paper is excellent in the landslide object.

③There are many excellent achievements in remote sensing-related super-score reconstruction, and there are also improvements based on the Gan network model, but the open source is not thorough and difficult to use directly, and other methods are related model variants, and their research goals and objects are the focus of model improvement.

④The performance of SRGAN can be observed in Reference 37, where it has been compared to other commonly employed methods, exhibiting significantly superior performance. This paper builds upon that, introducing enhancements and conducting comparative experiments on SRGAN.

Therefore, this paper chooses several representative classical methods to describe the improved model in more detail, including the visualization of landslides by Fig11 and Fig12.

9. In section 5 you could include multiple views to enhance data resolution and features. Check captions of images and section titles.

Response： Thank you for pointing out this problem in our manuscript. We have added detailed comparisons as you suggested so that we can see the differences, and the revised figure 9 is reflected in the manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

Authors answered to all of my questions.