Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this paper, the authors introduce an innovative no-reference quality assessment network. The proposed method is innovative, and the experiments are comprehensive. However, there are some problems to be solved. The detail comments are as follows:
1. The VGG network was used as the basic feature extraction networks, its advantage should be introduced.
2. The advantages and disadvantages of the proposed method should be described, and the future work should be included.
3. The parameter λ in equation (10) lacks an explanation in this paper, it is only discussed in ablation experiments. Please provide an explanation.
4. Please recheck to ensure that the paper format is standardized. In particular, the formula should be centered.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper proposed a no-reference quality assessment network (DFSM-net) to assess the quality of pansharpened image. It proposed Siamese feature extraction and similarity measure (FESM) to extracts the spatial/spectral deep features, and calculates the similarity of the corresponding pair of deep features to obtain the spatial/spectral feature parameters, which represents the spatial/spectral distortion of fusion image. Moreover, it proposed a novel loss to quantify the variations among different fusion methods in a batch to improve the network's accuracy and robustness. The experimental results show that the proposed method is superior to all comparison methods and achieves the highest indices. In sum, the proposed idea is novel. However, some points should be further explained before the possible acceptance.
1. For the weight of parameter lambda in the loss function, the ablation experiment gives the optimal value of 4. If lambda is further increased, whether the index is optimal or not should be supplemented with experimental proof.
2. Experimental results in Figure 4 is incomplete, please reformate and update the image.
3. The comments for the letters in formula (1) are missing, please provide relevant explanations. Table 2 lacks clarity and requires further explanation. Please explain the meaning of the numbers associated with the scores.
4. In the second section of proposed method, it is suggested that more emphasis be placed on the novelty of the proposed method, as well as the motivation and innovation.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe fashionable practice of developing pansharpening methods based on deep learning has generated something very singular: a way to assess the quality at full scale that is also based on learning but is proposed for no-reference full-scale evaluations.
If the authors wish to pursue the idea they have developed, they must first change the words “no-reference” in the title and wherever applicable, with something like “perceptual” or “subjective”, because it seems to me that the sole merit of the present submission is to avoid organizing subjective trials for perceptual assessments.
A series of remarks follows.
The introduction fails in exposing that the sole reason for which present no-reference indexes sometimes produce puzzling results. The reason that has been found, e.g., in “Full-Resolution Quality assessment of pansharpening: Theoretical and hands-on approaches”, IEEE GRSM, DOI 10.1109/MGRS.2022.3170092, is that inconsistencies may be given by aliasing artifacts, appearing in interpolated MS but missing in fusion products of CS methods, and especially by local shifts between interpolated MS and Pan originated by uncorrected parallax views. It’s fun that instead of using AI to correct the data, the authors wish to use AI to train the assessment with defective data and draw conclusions on the merits of one method rather than another. Incidentally, the data may be corrected without AI.
The benchmarking is unfair: apart that three instruments over six are dismissed (IKONOS-2, QuickBird-2 and WorldView-4), the comparison is mostly with methods based on CNN, apart from ATW-M2, dating back to 2000 and notable only for pioneering the use of “à trous” wavelet transform; GSA, generalizing the widespread Gram-Schmidt spectral sharpening but still dated; CNMF, specifically developed for HS data, not for MS pansharpening. If the methods are not reproducible, as by definition are all methods based on learning, and the evaluation is not reproducible, any conclusions drawn from the assessment would be improper and gratuitous. In a possible revised version, I recommend using benchmarks that are up-to-date nonparametric reproducible methods, whose performance is known.
If the authors want to demonstrate that their subjective method has an objective validity, they should cut their big training dataset of pansharpened images in two parts, Subset 1 and Subset 2, and perform two separate trainings, one on Subset 1, another on Subset 2, of two assessment methods, namely, Method 1 and Method 2; then they should make forward- and cross-assessments, i.e., Subset 1 assessed by either Method 1 and Method 2 and vice-versa. If the forward assessments are in accordance with crossed assessment, the subjective method developed is at least reproducible in results.
Please check references because some are incomplete, e.g., [23].
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsI have no further concerns to raise.