Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

Remote Sens. 2025, 17(12), 2074; https://doi.org/10.3390/rs17122074

by Junhui Li

and Xingsong Hou^*

Reviewer 1: Anonymous

Reviewer 2:

Zhipeng Dong

Reviewer 3:

Lei Yang

Reviewer 4:

Kun Gao

Remote Sens. 2025, 17(12), 2074; https://doi.org/10.3390/rs17122074

Submission received: 9 February 2025 / Revised: 7 May 2025 / Accepted: 12 May 2025 / Published: 17 June 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper innovatively proposes the application of Invertible Neural Networks (INNs) to the field of remote sensing image compression. By encoding compression distortions into latent variables following a Gaussian distribution and leveraging inverse mapping, it generates enhanced images with superior perceptual quality. The proposed INN-RSIC framework effectively balances perceptual quality and fidelity through the design of channel expansion, Haar transform, Conditional Generation Module (CGM), and Quantization Module (QM). As a lightweight Plug-and-Play (PnP) module, it enhances flexibility. Experimental results demonstrate that this method outperforms existing mainstream algorithms (such as HiFiC, ELIC, etc.) under low complexity, providing significant value for research in the field of remote sensing image compression.

Modification Suggestion:

The invertible block proposed in the paper achieves bidirectional mapping through channel splitting, scaling, and shifting (Equations 1-3), but it does not explicitly explain how the absolute value of the Jacobian determinant is always guaranteed to be 1 (satisfying the invertibility condition). In practical applications, will minor deviations in scaling parameters lead to numerical instability? Can mathematical proofs or numerical experiments be provided to verify the robustness of this design?
The CGM conditions latent variables into a Gaussian distribution (Equation 5), but does the compression distortion of remote sensing images naturally follow a Gaussian distribution? If the actual data distribution deviates from the Gaussian assumption, will it introduce reconstruction errors? Is there any evidence supporting the rationality of this distribution choice (such as KL divergence analysis or data distribution visualization)?
Haar transform is used in the paper to separate high-frequency and low-frequency features (Figure 2), but it does not explain why Haar transform is chosen instead of other linear transforms (such as DCT). Can it be proven that this decomposition has optimality or computational efficiency advantages in compressive sensing tasks? Additionally, will the blocking effect of Haar transform affect reconstruction accuracy?
The design purpose of QM is to mitigate data type conversion issues during the inference stage, but its specific impact on perceptual quality has not been thoroughly analyzed. How does quantization error get compensated through the invertible network? Is there a trade-off between perceptual loss (such as LPIPS) and reconstruction loss? Can loss function gradient analysis or ablation experiment results be provided?
Experiments show limited LPIPS gains at low bit rates (Figure 9), but the reasons are not analyzed (such as severe compression distortion making it difficult for INN to recover details, or insufficient model capacity). Can this phenomenon be improved by introducing an adaptive weighting mechanism (such as dynamically adjusting the perceptual loss based on bit rate)? It is recommended to supplement comparative experiments under low bit rate training data.
Ablation experiments verify the necessity of QM and CGM, but they do not explore the synergistic mechanisms between them and other modules (such as channel expansion). For example, does the output of CGM depend on the feature distribution after channel expansion? Are there any combinations of modules that significantly degrade performance? It is recommended to supplement correlation analysis or visualize intermediate features.
The proposed enhancement module needs to seamlessly integrate with existing compression algorithms (such as ELIC), but is it sensitive to the encoding parameters of the compressed bitstream (such as QP values, block partitioning) during actual deployment? Can it ensure improved perceptual quality under non-ideal encoding conditions (such as severe loss of high-frequency information)? It is recommended to supplement robustness tests for different encoding strategies.
The current experiments only verify the enhancement effect on images decoded by ELIC, but they do not compare the output with other deep learning compression algorithms (such as HiFiC, Swin-Transformer-based methods). Does the performance improvement of INN-RSIC depend on specific decoder characteristics? If directly applied to images compressed by JPEG2000, can the same gains be reproduced? It is recommended to supplement cross-algorithm comparison experiments.

Author Response

Thank you for your thorough review and valuable feedback on our manuscript. We have carefully addressed all the comments and incorporated the necessary revisions to enhance the quality of our work. Please see the attached PDF file.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper proposed an enhancing perception quality for remote sensing image compression by invertible neural network. Overall, the paper is clear and complete. However, there are some issues that should be addressed carefully.

In Figure 4, the proposed method does not show better quantitative evaluation results than all other methods.
What is the future work?
The language of the paper needs to be revised and polished by a professional.

Comments on the Quality of English Language

The language of the paper needs to be revised and polished by a professional.

Author Response

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The abbreviation “bpp” mentioned in Figure 1 (page 3) is provided without its full name and detailed definition. For clarity, could you explicitly state that “bpp” stands for “bits per pixel” and explain its role in measuring image compression efficiency in the introduction?
The term “decoded image” mentioned multiple times in the article has not been clearly explained. Could you define it in the methods section as “the image reconstructed by a compression algorithm”?
Why is Haar transform chosen over other wavelet bases (e.g., Daubechies) for separating high and low frequencies? It is recommended to elaborate on its advantages (e.g., computational efficiency or multi-resolution analysis) and validate this choice through ablation studies with alternative wavelet transforms.
The article only cites reference [28] to illustrate that the latent variables follow a Gaussian distribution, without explaining its rationality in conjunction with the task of this article. It is recommended to justify this choice theoretically (e.g., maximum entropy principle) or experimentally by comparing reconstruction performance under different distributions.
The perceived quality measurement in the article only relies on objective indicators such as LPIPS and has not been subjectively validated. However, the article has already mentioned that perception can better match human visual preferences. To comprehensively evaluate the quality of image perception, it is recommended to invite users to rate decoded/enhanced images and correlate these scores with LPIPS/PSNR metrics.
The description of “Compressor” in Figure 2 (a) of Section 3 is only labeled as “ELIC [17]” and its internal structure is not illustrated. To clearly demonstrate the structure of “Compressor”, it is recommended to provide a modular flowchart to clarify its architecture and explain how ELIC contributes to the overall framework.
The experimental results are unsatisfying, and further experiments and analyses are required.

Author Response

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Aiming at improving the perceptual quality of decoded images while maintaining high fidelity, this manuscript proposes an invertible neural network-based enhancement method (INN-RSIC) for remote sensing image compression. The key innovation lies in using INN to model compression distortion and integrating a quantization module and conditional generation module to optimize reconstruction. The manuscript is well written and the intentions of the manuscript are good, but the following questions need to be answered. Finally, a minor revision is given.

How’s the proposed method’s performance on higher-resolution RS images? The experimental section was only conducted on low-resolution datasets, which undermines the persuasiveness of the conclusions.
Ablation experiments were not conducted for channel expansion and Haar transformation. Would it be better to perform ablation experiments on these modules to make it more persuasive?
Pay attention to the format mistakes in the manuscript.
“IRN-RSIC”in page 11 Fig. 4

and so on.

Comments for author File: Comments.pdf

Author Response

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have well addressed all my comments. I have no more concerns.

Author Response

We thank the reviewer for the efforts to improve the quality of our manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

You have made some modifications to the suggestions from the first review and provided positive answers to some questions, which are considered reasonable. However, the following issues raised in the first review have not yet been clearly revised:

The author still does not explain why the Haar transform was chosen instead of other wavelet bases (such as Daubechies) to separate high and low frequencies. It is suggested to provide a detailed explanation of its advantages, such as computational efficiency or multi-resolution analysis, and validating this choice through ablation studies using alternative wavelet transforms.
Merely citing reference [28] is not sufficient to demonstrate that it is reasonable for latent variables to follow a Gaussian distribution. It is suggested that the author explain its rationale in conjunction with the task of this article. For example, to prove the rationality of this choice theoretically (such as the maximum entropy principle) or by comparing the reconstruction performance under different distributions.
The author mentioned on the second page that "traditional evaluation indicators often cannot be consistent with human visual perception," but based on the experimental results, it is difficult to intuitively feel the improvement in perceptual quality from the image effects. To comprehensively evaluate the quality of image perception, it is recommended to invite users to rate the decoded/enhanced images and associate these scores with LPIPS/PSNR metrics.

Article Menu

Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

Further Information

Guidelines

MDPI Initiatives

Follow MDPI