Next Article in Journal
Coral Reef Benthos Classification Using Data from a Short-Range Multispectral Sensor
Next Article in Special Issue
In-Memory Distributed Mosaicking for Large-Scale Remote Sensing Applications with Geo-Gridded Data Staging on Alluxio
Previous Article in Journal
Impact of Tides and Surges on Fluvial Floods in Coastal Regions
Previous Article in Special Issue
Time-Series FY4A Datasets for Super-Resolution Benchmarking of Meteorological Satellite Images
 
 
Article
Peer-Review Record

Remote Sensing Image Super-Resolution via Residual-Dense Hybrid Attention Network

Remote Sens. 2022, 14(22), 5780; https://doi.org/10.3390/rs14225780
by Bo Yu 1, Bin Lei 2, Jiayi Guo 3,4,5, Jiande Sun 1, Shengtao Li 1,* and Guangshuai Xie 2
Reviewer 1:
Reviewer 2:
Remote Sens. 2022, 14(22), 5780; https://doi.org/10.3390/rs14225780
Submission received: 31 July 2022 / Revised: 21 October 2022 / Accepted: 10 November 2022 / Published: 16 November 2022

Round 1

Reviewer 1 Report

1.     Introduction: in paragraph 1, more relevant references published in recent two years should be cited, such as:

(1)    “Saliency-Guided Remote Sensing Image Super-Resolution”

(2)    Multi-Stage Feature Fusion Network for Video Super-Resolution

(3)    Learning interlaced sparse Sinkhorn matching network for video super-resolution

2.     Method: in paragraph 2, actually, there is a big gap of blur degrees between LR and Ref images, which is hard to model with bicubic interpolation.

3.     In equation 1: the symbols are inconsistent with those in figure 1.

4.     In section 4.1.1: Reference should be added for CUFED.

5.     In section 4.2: it seems that there is no need to compare with SISR methods, since they are proposed under different inputs.

Besides, the comparison methods are too old. Perhaps the author can compare it with some newly proposed methods.

Author Response

Response to Reviewer 1 Comments

 

Point 1: Introduction: in paragraph 1, more relevant references published in recent two years should be cited.

 

Response 1: According to your opinion, I have added the references published in the last two years to the corresponding positions in the article.

 

Point 2: Method: in paragraph 2, actually, there is a big gap of blur degrees between LR and Ref images, which is hard to model with bicubic interpolation.

 

Response 2: As you said, there is really a big difference in the blurriness between LR and Ref images, so we first apply bicubic down-sampling on Ref images to obtain IRef↓ with similar blurriness to ILR .

 

Point 3:  In equation 1: the symbols are inconsistent with those in figure 1.

 

Response 3: According to your opinion, I have revised the corresponding position in the article. 

 

Point 4: In section 4.1.1: Reference should be added for CUFED.

 

Response 4: According to your opinion, I have revised the corresponding position in the article. 

      Point 5: In section 4.2: it seems that there is no need to compare with SISR methods, since they are proposed under different inputs.Besides, the comparison methods are too old. Perhaps the author can compare it with some newly proposed methods.

Response 5: Thank you for your suggestion, but I think it is meaningful to compare with SISR methods. Although the inputs are different, the image super-resolution in most remote sensing fields is based on SISR method, while our method is improved based on SRNTT. It is necessary to compare with these classic SISR methods. 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

This work presents new deep learning (DL)-based approach for super-resolution using a residual-dense hybrid attention network. Unlike traditional approaches, the proposed methodology relies on lightweight mechanisms that do not require much network engineering. Furthermore, the proposed method relies on using a Ref and LR image. The authors use SRNTT as the backbone structure in this method to overcome the probable inaccuracy due to the mismatch between the two images. The work addresses an essential task in the field of remote sensing. It is clear in most parts and fits the “remote sensing” journal. However, some key issues must be considered to make the work suitable for publication:

 

1)      There is no information regarding the training steps. For example, the amount of data required to train the network, convergence tests, and complexity?

2)      The presented results for real remote sensing data include only grayscale images. It is essential to show the performance using multiband images, at least RGB.

3)      The visual presentation of the proposed network should be improved, mainly to show how the Ref and LR images are treated.

4)      Graphical visualization of the patch matching process will help understand the work’s main idea. Therefore, I suggest adding such a figure.

 

Additional minor comments:

In the introduction:

Page 1: The authors mention, “…..it is usually impossible to maintain a good spatial resolution,” What is a good resolution? A particular spatial resolution may not be sufficient for single use but is enough for another, depending on the problem’s scale. Using the term “good” is incorrect in this case.

Page 1: The authors mention, Most of the SR methods proposed earlier are based on interpolation technology, but the reconstruction effect of these methods is poor.” However, most recent approaches (in the last decade) rely on novel techniques (including machine learning) and not only interpolations. Consider rewriting this sentence.

 

In section 2. Related work:

Page 2: “SRCNN [15] is the first work in the field of SISR….” This sentence is confusing. I assume you mean that the work in [15] is the first DL-based SSIR!?. Please recheck this part.

 

Finally, discuss how the proposed method can be adapted to multi and hyper-spectral data.

Author Response

Response to Reviewer 2 Comments

 

Major comments:

 

Point 1: There is no information regarding the training steps. For example, the amount of data required to train the network, convergence tests, and complexity?

 

Response 1: According to your opinion, I have added relevant content to the ablation experiment. 

 

Point 2: The presented results for real remote sensing data include only grayscale images. It is essential to show the performance using multiband images, at least RGB.

 

Response 2: Thank you for your suggestion. Due to my negligence, the dataset did not include relevant data on multiband images. But I think the good performance of remote sensing grayscale images and natural RGB images can also infer their generalization in remote sensing RGB images .

 

Point 3: The visual presentation of the proposed network should be improved, mainly to show how the Ref and LR images are treated.

 

Response 3: I'm sorry, but I haven't worked out a specific revision plan for this part. If I can get the results of the next round of minor repairs, I will focus on revising this part

 

Point 4: Graphical visualization of the patch matching process will help understand the work’s main idea. Therefore, I suggest adding such a figure.

 

Response 4: Thank you for your suggestion. Unfortunately, the patch matching process is an offline task and cannot be visualized .

 

    

 

 

 

 

 

 

 

 

 

 

 

  

 

Additional minor comments

 

Point 1: Page 1: The authors mention, “…..it is usually impossible to maintain a good spatial resolution,” What is a good resolution? A particular spatial resolution may not be sufficient for single use but is enough for another, depending on the problem’s scale. Using the term “good” is incorrect in this case.

 

Response 1: According to your opinion, I have changed "good" to "high" in the corresponding position in the article. 

 

Point 2: The authors mention, “Most of the SR methods proposed earlier are based on interpolation technology, but the reconstruction effect of these methods is poor.” However, most recent approaches (in the last decade) rely on novel techniques (including machine learning) and not only interpolations. Consider rewriting this sentence.

 

Response 2: According to your opinion, I have revised the corresponding position in the article.  

Point 3: Page 2: “SRCNN [15] is the first work in the field of SISR….” This sentence is confusing. I assume you mean that the work in [15] is the first DL-based SSIR!?. Please recheck this part.

Response 3: According to your opinion, I have revised the corresponding position in the article. 

 

Point 4: Finally, discuss how the proposed method can be adapted to multi and hyper-spectral data.

 

Response 4: According to your opinion, I have revised the corresponding position in the article. 

 

Author Response File: Author Response.docx

Reviewer 3 Report

While the topic and developed network are presented acceptably, there are many items that are needed to be addressed. Below are the general comments. Please see the details comments as well.

1-    Please add line numbers for future versions.

2- The majority of the sentences are written in plain, casual, and rudimentary ways, which is not appropriate for journal articles. Please have the document reviewed by an editor.

3-    A lot of material was presented in an ambiguous which requires further explanations.

4-    Please discuss the mentioned papers in the literature review section in more detail.

5-    Please discuss the alignment accuracy of the LR-HR images, and comment on anomalies that are observed in the images.

 

Page 1

1-    Please provide evidence to support that. Large-scale magnification is not possible via LR images.

2-    What if we don’t have the reference image? It seems like there is a limitation to the type of the LR images.

3-    Please define intra-block.

 

4-    Please replace the “and so on” with better examples, also building extraction can be part of urban planning. Please provide a more diverse examples.

5-    Please explain what is meant by long time series.

6-    Please provide examples of studies that used interpolation technology. Also, please explain what interpolation technologies. Do you mean interpolation methods?

7-    Please update the way you are using references.

Page 2

 

8-    GAN is not a Generative Confrontation Network!

9-    Co-registering remotely sensed data is a hard task, and usually, there are some errors. Please comment on this matter and quantify the registration accuracy. We cannot assume the LR and HR satellite images can align without issues by using lat lon values.

10- Please explain what SRNTT is.

11- Please summarize the finding of successful studies, in particular 10  and 11.

12- Please explain what “Different from natural images, the spatial information of images is very large and different”. Particularly, in what aspects? What is meant by the higher level of abstraction?

13- Deepening the depth, width, and cardinality????

14- Please discuss why your method is “convenient”.

15- Some lightweight mechanism????

Page 3

16- Most of the mentioned studies, for example, 22, 24, and 35 should be briefly discussed within the document, similar to the SRNTT study.

Page 4

17- Please explain inter-dependence in parenthesis.

18- The misalignment needs to be quantified. If this is the case, please make sure to state the values briefly here. If not, it is expected to comment on this matter.

19- SRNTT needs to be reviewed in more detail before start discussing the modifications.

Page 5

20- Please explain briefly what neural feature space is. Also, please comment on the neural features. What they are, and how they are used.

Page 8:

21- Did the authors observe any specific anomalies that were added by the developed network to HR images compared to Ref images? Please compare the number of added anomalies with other methods like VDSR. This needs to be studied as unusual artifacts can result in inaccurate images and can cause issues.

 

Author Response

Response to Reviewer 3 Comments

 

Point 1: Please provide evidence to support that. Large-scale magnification is not possible via LR images.

 

Response 1: Due to the lack of information in low-resolution (LR) images, it is difficult to reconstruct the fine texture of HR images under large-scale magnification factors only by relying on the original LR image information, while Ref images with similar content can alleviate this problem through the exchange of similar textures. RefSR method can be regarded as a branch task supplemented by SISR method.

 

Point 2: What if we don’t have the reference image? It seems like there is a limitation to the type of the LR images.

 

Response 2: When there is no reference image, the RefSR task will not perform texture exchange because there is no additional input. The RefSR task will be transformed into a SISR task, and its performance will not be lower than that of the SISR task. In addition, because Google satellite images can cover the world, it can provide Ref images with certain content similarity for most satellite images, which ensures the applicability of our method.

 

Point 3:  Please define intra-block.

 

Response 3: I realize that it may be ambiguous to say so. According to your opinion, I have revised the corresponding position in the article. 

 

Point 4: Please replace the “and so on” with better examples, also building extraction can be part of urban planning. Please provide a more diverse examples.

 

Response 4: According to your opinion, I have revised the corresponding position in the article. 

      Point 5: Please explain what is meant by long time series.

Response 5: According to your opinion, I have revised "long time series" to "difficult to maintain long time coverage and high spatial resolution at the same time" at the corresponding position in the article.

 

 

 

 

Point 6:  Please provide examples of studies that used interpolation technology. Also, please explain what interpolation technologies. Do you mean interpolation methods?

 

Response 6: Here is my statement error. According to your opinion, I have modified the corresponding position in the article and added examples of studies that used interpolation methods.

 

Point 7: Please update the way you are using references.

 

Response 7: Sorry, I don't understand your meaning here. Can you tell me how to modify the way of using references?

 

Point 8:  GAN is not a Generative Confrontation Network!

 

Response 8: According to your opinion, I have revised the corresponding position in the article. 

 

Point 9: Co-registering remotely sensed data is a hard task, and usually, there are some errors. Please comment on this matter and quantify the registration accuracy. We cannot assume the LR and HR satellite images can align without issues by using lat lon values.

 

Response 9: As you said, co-registering remotely sensed data is a hard task, and there are usually some errors. We do have some deviations when aligning LR and HR satellite images according to longitude and latitude. But for the RefSR task based on texture matching, we search for similar textures in the global scope for exchange. The longitude and latitude matching method can ensure that LR and HR images have certain content similarity, which is sufficient to meet the use premise of the RefSR method. In addition, HR input with deviation is fair for all RefSR methods.

 

Point 10: Please explain what SRNTT is.

Response 10: The main idea of SRNTT is to search for matching textures from IRef in the feature space, and then migrate these textures to ISR in a multi-scale way. This multi-scale texture migration considers the semantic (high-level) and texture (low-level) similarities between ILR and IRef at the same time, which can migrate related textures and suppress irrelevant textures.

 

Point 11: Please summarize the finding of successful studies, in particular 10 and 11.

Response 11: CrossNet adopted optical flowto align input and reference. However, optical flow is limited in matching long distance correspondences, thus incapable of handling significantly misaligned references.

 

 

 

 

Point 12:  Please explain what “Different from natural images, the spatial information of images is very large and different”. Particularly, in what aspects? What is meant by the higher level of abstraction?

 

Response 12: General natural images only contain a few simple objects due to their small coverage area; Because of the large coverage area, remote sensing images will contain many complex ground objects, such as multiple residential buildings, roads, trees, etc. A higher level of abstraction refers to the ability to obtain feature maps that contain more useful information.

 

 

Point 13: Deepening the depth, width, and cardinality????

 

Response 13: This refers to using convolution blocks with larger convolution kernel scale and adding more convolution layers.

 

Point 14: Please discuss why your method is “convenient”.

 

Response 14: Here we do not define our method as “convenient”, but the modules added on the basis of the original SRNTT are “portable”.

 

Point 15: Some lightweight mechanism???? 

 

Response 15: Lightweight mechanism means that it will not bring too much computing pressure and storage pressure to the network. 

 

Point 16: Most of the mentioned studies, for example, 22, 24, and 35 should be briefly discussed within the document, similar to the SRNTT study.

 

Response 16: 22 and 23 are the first RefSR methods, and their main ideas are summarized in the paper.

 

Point 17: Please explain inter-dependence in parenthesis. 

 

Response 16: Different channel features of images are interrelated. CA allows the network to perform feature recalibration. Through this mechanism, the network can learn to use global information to selectively emphasize informational features and suppress less useful features.

 

 

Point 18: The misalignment needs to be quantified. If this is the case, please make sure to state the values briefly here. If not, it is expected to comment on this matter.

 

Response 18: Yes, our dataset is not perfect in image alignment processing, but even so, HR-Ref pairs at this alignment level still have certain content similarity, which is sufficient to meet the premise of RefSR method based on texture matching. Of course, according to the characteristics of the RefSR method, our method will perform better when the alignment is higher

 

Point 19: SRNTT needs to be reviewed in more detail before start discussing the modifications.

 

Response 19: According to your request, I have finished the work. 

 

Point 20: Please explain briefly what neural feature space is. Also, please comment on the neural features. What they are, and how they are used.

 

响应20特征向量所在的空间,每个特征对应于特征空间中的一维坐标。

 

第21与Ref图像相比,作者是否观察到开发网络添加到HR图像中的任何特定异常?请将添加的异常数量与VDSR等其他方法进行比较。这需要研究,因为不寻常的伪像可能会导致图像不准确并可能导致问题。

 

响应 21与原始 HR 图像相比,RefSR 方法确实从 Ref 图像中携带了一些伪影。但是,在交换过程中,由于会找到与HR补丁最相似的补丁,因此伪影的影响大大降低。

 

 

 

 

 

Round 2

Reviewer 1 Report

I have no further comments. 

Author Response

Please see the attachment.

Reviewer 2 Report

The authors have addressed my main comments, but I still have two points that may require minor revision:

1)     The authors did not have the time to address the following comment and ask for the opportunity to address it in the second round: “The visual presentation of the proposed network should be improved, mainly to show how the Ref and LR images are treated”.

2)     In the first round, I commented: “Graphical visualization of the patch matching process will help understand the work’s main idea. Therefore, I suggest adding such a figure”.

And the authors answered: Thank you for your suggestion. Unfortunately, the patch-matching process is an offline task and cannot be visualized.

I mean visualizing the concept of patch-matching you are using, not the process itself. I recommend you make an effort to add such a visualization.

 

Finally, for your following papers, it’s recommended to highlight your revised parts in the manuscript, so the reviewers can easily find the changes you made.

 

 

Good luck  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The current status of responses is hard to track as there is no line/page number reference to your modifications and you did not use color to highlight changes in the revised version. More importantly, why is there a non-English text in your responses? Please address these issues. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop