Next Article in Journal
Analysis of the Influence of Polarization Measurement Errors on the Parameter and Characteristics Measurement of the Fully Polarized Entomological Radar
Next Article in Special Issue
Bridging Domains and Resolutions: Deep Learning-Based Land Cover Mapping without Matched Labels
Previous Article in Journal
Satellite-Derived Bathymetry in Support of Maritime Archaeological Research—VENμS Imagery of Caesarea Maritima, Israel, as a Case Study
 
 
Article
Peer-Review Record

Denoising Diffusion Probabilistic Model with Adversarial Learning for Remote Sensing Super-Resolution

Remote Sens. 2024, 16(7), 1219; https://doi.org/10.3390/rs16071219
by Jialu Sui 1, Qianqian Wu 1 and Man-On Pun 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2024, 16(7), 1219; https://doi.org/10.3390/rs16071219
Submission received: 18 February 2024 / Revised: 21 March 2024 / Accepted: 23 March 2024 / Published: 30 March 2024

Round 1

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

Single lmage Super-Resolution (SISR) for image enhancement enables the generation of high spatial resolution in Remote Sensing (RS) images without incurring additional costs. This approach offers a practical solution to obtain high-resolution RS images, addressing challenges posed by the expense of acquisition equipment and unpredictable weather conditions. To address the over-smoothing of the previous SISR models, the diffusion model has been incorporated into RSSISR to generate Super-Resolution (SR) images with enhanced textural details. In this paper, we propose a Diffusion model with Adversarial Learning Strategy (DiffALS) to refine the generative capability of the diffusion model, DiffALS integrates an additional Noise Discriminator (ND) into the training process, employing an adversarial learning strategy on the data distribution learning This ND guides noise prediction by considering the general correspondence between the noisy image in each step, thereby enhancing the diversity of generated data and the detailed texture prediction of the diffusion model. Furthermore, considering that the diffusion model may exhibit suboptimal performance on traditional pixel-level metrics such as Peak Signal-to-Noise Ratio (PSNRand Structural Similarity (SSlM), we showcase the effectiveness of DiffALS through downstream semantic segmentation applications. Extensive experiments on three datasets, namely Alsat, OLI2MSIand Vaihingen, demonstrate that the proposed model achieves remarkable accuracy and notable visual enhancements compared to other state-of-the-art methods. This is an interesting research paper. There are some suggestions for revision.

 

1)       At the end of the abstract, it will be more intuitive and convincing to illustrate the qualitative results of a large number of experiments for verifying the superiority and effectiveness.

2)       The motivation is not clear. Please specify the importance of the proposed solution.

3)       The listed contributions are a little bit weak. Please highlight the novelty of the proposed solution.

4)       The related work part is too long. It is suggested to compress this part, which is known in literature.

5)       Limited discussion on computational efficiency: Although this paper briefly describes the impact of the proposed method on real-time performance, a more detailed discussion of computational efficiency, especially during ablation studies, would be valuable. Consider a more nuanced analysis of the trade-off between accuracy and speed.

6)       Discussion on limitations and challenges: This article lacks a section dedicated to discussing potential limitations and challenges of the proposed approach. Acknowledging and addressing these components provides a more complete understanding of the applicability of the approach in different scenarios.

7)       Parameters in the compared methods should be provided.

8)       Make sure your conclusions appropriately reflect on the strengths and weaknesses of your work, how others in the field can benefit from it, and thoroughly discuss future work.

9)       In the reference section, it will be better to search and cite more latest research, which can better reflect the innovation of this paper.

10)    The references are not appropriately inserted into this paper. Please check.

Comments on the Quality of English Language

NA

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The paper seems to be interesting but it doesn't contain the numbered reference's.

Comments on the Quality of English Language

The paper seems to be interesting but it doesn't contain the numbered references to.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

This paper presents an interesting method which improves a lot the look of SISR images. The research is well presented but I have some important remarks linked to the objective of SISR for satellite images. It is very important to clearly explain why the objective of SISR for satellite images (retrieving reality) is different from the objective of SISR for daylife photos (showing realistic images).

 

Intro lines 23-30: SISR is not increasing resolution even if it is often presented like this in the litterature. It only better represents the information already in the image. Thus it cannot be compared to an increase of resolution by improving the hardware. It is very important to understand that the improvement due to such image processing is not recovering information that was not measured. It only better retrieves the information which is already in the image. You cannot bypass the physics. Explaining things like this would show that you fully understand the problematic.

fig1 reverse

144 reverse

226 reverse

§3.1 You should explain more clearly what are the GSD for each dataset, GSD of each LR and of each HR. E.g for OLI2MSI explain that you want to generate MSI-like images at 10m using 30m OLI images at 30m.

Also, it is not clearly said if the models are different between each dataset (one training for each dataset) or if it is a unique model working on the 3 datasets.

233 it should be the contrary, 30m for OLI and 10m for MSI

238 Vaihingen original data are at 9cm normally. Illustrations shown at the end are not even at such high resolution. Please say clearly the GSD of each LR and HR images.

fig3 LR image (a) seems to be the same than HR image (b)

§3.3 The much lower performance for PSNR and SSIM may be explained by the fact that many structures, even if realistic, are in fact hallucinated while other methods propose less details but with less hallucination. The radiometric difference (visible in the color deviation) can also explain this bad score. For remote sensing applications, we do not want realistic images but images as close as possible from reality. This means that hallucinations are not wanted. It is much better to have less details but true details than more realistic details but false. This is a very important thing. You should add a short discussion on this in this § or in the conclusion.

Last, references are not visible in the file. Difficult to asses the relevance of it.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

All my concerns have been addressed. I recommed this paper for publication.

Comments on the Quality of English Language

NA

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The authors actually improved the paper after the previous revision. It would be good to include the description of the metrics abbreviations they used in the tables of performance comparison.

The name of the section Computational Complexity Analysis as well as the name of the corresponding table may be changed to running time comparison since computational complexity analysis should include a theorical analysis of the complexity order. The running time performance analysis requires the specification of the hardware infrastructure used in the experiments. These time results should be mentioned in the Conclusions.

Comments on the Quality of English Language

Just minor revision would be necessary.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposed a denoising diffusion probabilistic model with adversarial learning, which is evaluated in the field of image super-resolution. In general, the proposed archtecture presents novelties in some aspects. Many comments are followings:

1. When discussing current CNN architecture, the multiscale architecture should be mentioned. This method can establish a large range dependence, which may contribute to SR.  The following reference may provide valuable inspiration in this aspect.  

Z. Chen, S. Tian, X. Shi and H. Lu, "Multiscale Shared Learning for Fault Diagnosis of Rotating Machinery in Transportation Infrastructures," in IEEE Transactions on Industrial Informatics, vol. 19, no. 1, pp. 447-458, Jan. 2023, doi: 10.1109/TII.2022.3148289.

2. It is suggested to include a disscusion on the convergency of the proposed architecture. A iteration curve should be presented in the revision.

3. Gaussian model is adopted to model the distribution of noise. How to solve other type of noies? This limitation should be mentioned, which motivates future studies. 

Comments on the Quality of English Language

No comment.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Overall, this is a paper with good organization and writing. In my view, the quality is suitable for Remote Sensing. I have no further issues. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper treats about the reconstruction of Super-Resolution images from low resolution remote sensing images. Three satellite Remote Sensing dataset are used : OLI2MSI, Alsat, and Vaihingen. Although the results presented in figure 3 and 4 seems to be very interesting, the paper suffer from a detrimental lack of description of the methods, which requires mandatory corrections.  

 

- Line 163-171, please provide a figure with a scheme of the CNP architecture.

- In equation (12), please explain precisely in the text what exactly is . In particular, please precise the output(s) of ND .

- Line 237, please provide a brief description of PSNR, SSIM, FID, and LPIPS, as well as references describing these metrics in details.

- Lines 245-248, the CNP architecture is insufficiently described. Please provide more details.

- Lines 256 to 259, authors are writting that “five images are selected 257 for the training set and 4 for the testing set”. The authors should explain why and how did they choose only 9 images in total from datasets containing a dramatically larger number of images (5325 images for OLI2MSI, 2182 for Alsat, 33 for Vaihingen).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Comment 1 : 

In their cover letter in response to my last comment, the authors indicate that "due to the considerable size of each Vaihingen image, exceeding 2000*2000 pixels, they must be partitioned for model input" and that they consequently "select 5 and 4 images from Vaihingen, splitting them into 1599 and 1251 segments, respectively, for the training and testing sets".  

On my opinion, partitionning the images is fine, but taking only 5 images for training and 4 images for testing is detrimental for the generalization capacity of the results presented in the paper. Therefore, the paper looses its interest.

Moreover, I think that the details provided by the authors in their response to my last comment should be included in the final version of the paper. 

 

Comment 2 :

The paper is not clear enough. It is not obvious to the reader which dataset is used for which task (OLI2MSI and Alsat datasets for super resolution task and Vaihingen for the segmentation task). In particular this fact is not clear enough in the lines in which the authors describe the main contributions of the paper (lines 109 to 124). 

 

Back to TopTop