Review Reports - Physics-Informed Gaussian-Enforced Separated-Band Convolutional Conversion Network for Moving Object Satellite Image Conversion

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Comments on the paper:

The paper addresses a relevant and timely topic in the field of satellite remote sensing, namely the interoperability between sensors from different satellite platforms, such as WorldView-3 and SuperDove. The manuscript clearly identifies the limitations of current approaches, particularly the spectral artifacts caused by temporal shifts, and proposes an original deep learning-based model to mitigate these effects.

While the contribution is promising, several aspects could be improved to enhance the clarity, readability, and scientific impact of the paper:

Strengths:

The paper focuses on a real and impactful challenge for environmental monitoring and Earth observation.

The technical issue is well justified and illustrated, especially the artifacts due to sequential band acquisition.

The proposed model (PIGESBCCN) introduces an original architecture that incorporates both physical principles and temporal information.

Suggestions for Improvement:

Model Name:

The model name PIGESBCCN is quite long and complex. A shorter or more intuitive acronym would improve readability and memorability.

Terminology Clarification:

Some technical terms (e.g., spectral band resampling or radiometric scaling) would benefit from brief explanations to ensure clarity, especially for a multidisciplinary audience.

Structure and Logical Flow:

The paper would benefit from a clearer distinction between:

The causes of spectral distortions (e.g., sequential capture)

Their consequences (e.g., misalignment for moving objects)

The proposed solution, and how it directly addresses the identified issues.

Temporal Correlation Modeling:

The manuscript mentions modeling temporal information, but a more detailed explanation is needed to understand how temporal relationships between bands are captured and leveraged in the network.

Experimental Validation (if not already included):

A quantitative evaluation of the proposed model, compared to existing methods, would strongly support its effectiveness. Metrics related to reconstruction quality or spectral fidelity would strengthen the scientific contribution.

Comments for author File: Comments.pdf

Author Response

Comments 1: The paper addresses a relevant and timely topic in the field of satellite remote sensing, namely the interoperability between sensors from different satellite platforms, such as WorldView-3 and SuperDove. The manuscript clearly identifies the limitations of current approaches, particularly the spectral artifacts caused by temporal shifts, and proposes an original deep learning-based model to mitigate these effects.

While the contribution is promising, several aspects could be improved to enhance the clarity, readability, and scientific impact of the paper:

Response 1: We thank the reviewer for their thoughtful comments for improving the paper. Please find the detailed responses below and corresponding revisions highlighted in the re-submitted files.

Comments 2: Strengths:

The paper focuses on a real and impactful challenge for environmental monitoring and Earth observation.

The technical issue is well justified and illustrated, especially the artifacts due to sequential band acquisition.

The proposed model (PIGESBCCN) introduces an original architecture that incorporates both physical principles and temporal information.

Response 2: We thank the reviewer for noting strengths of the work.

Comments 3: Suggestions for Improvement:

Model Name:

The model name PIGESBCCN is quite long and complex. A shorter or more intuitive acronym would improve readability and memorability.

Response 3: We agree the model name is quite long, however when pronounced it sounds like “Pigs Bacon”, which we think is both easy to pronounce and memorable.

Comments 4: Terminology Clarification:

Some technical terms (e.g., spectral band resampling or radiometric scaling) would benefit from brief explanations to ensure clarity, especially for a multidisciplinary audience.

Response 4: We thank the reviewer for this feedback. We have replaced the technical terms mentioned with more explanatory language, for greater clarity for a multidisciplinary audience. Specifically, in the Abstract, page 1, line 10 “spectral band resampling” is replaced with “changing image channel wavelengths” and “radiometric scaling” is replaced with “per-band intensity scales”. We have also added a brief explanation in line 11-12 of the Abstract to ensure clarity of these phrases, by stating that “different sensors can acquire imagery of the same scene at different wavelengths and intensities”.

Comments 5: Structure and Logical Flow:

The paper would benefit from a clearer distinction between:

The causes of spectral distortions (e.g., sequential capture)

Their consequences (e.g., misalignment for moving objects)

The proposed solution, and how it directly addresses the identified issues.

Response 5: We thank the reviewer for their comment distinguishing these three logical steps and restructure the Abstract on page 1 according to the recommended flow, with additions to lines 14-15, 16-17, and 20. The new abstract is:

“Integrating diverse image datasets acquired from different satellites is challenging. Converting images from one sensor to another, like from WorldView-3 (WV) to SuperDove (SD), involves both changing image channel wavelengths and per-band in-tensity scales because different sensors can acquire imagery of the same scene at different wavelengths and intensities. A parametrized convolutional network approach has shown promise converting across sensor domains, but it introduces distortion artefacts when objects are in motion. The cause of spectral distortion is due to temporal delays between sequential multispectral band acquisitions. This can result in spuriously blurred images of moving objects in the converted imagery, and consequently misaligned moving object locations across image bands. To resolve this, we propose an enhanced model, the Physics-Informed Gaussian-Enforced Separated-Band Convolutional Conversion Network (PIGESBCCN), which better accounts for known spatial, spectral, and temporal correlations between bands via band reordering and branched model architecture.”

Comments 6: Temporal Correlation Modeling:

The manuscript mentions modeling temporal information, but a more detailed explanation is needed to understand how temporal relationships between bands are captured and leveraged in the network.

Response 6: We thank the reviewer for this comment and add a more detailed explanation to the first paragraph of the Results section, page 5, lines 158-167. Specifically, we detail that:

“WV sensors acquire their 8 spectral bands not in wavelength order. Rather, the spectral indices are temporally acquired in the order of 1, 8, 7, 5, 4, 3, 6, then 2. Thus, if one were to conduct convolutions over spectrally adjacent bands (i.e. bands 1 and 2), this would lead to a large target blurring artefact, as the temporal gap between band 2 (last temporally acquired) and band 1 (first temporally acquired) would result in greatly different target locations. Instead, we leverage the known temporal order of acquisition to first rearrange the bands, such that target motion is smooth between bands and convolutions act on temporally adjacent bands. Then, we architect the model with separate convolutional branches for each output band as an additional precaution against large cross band target position blurring.”

Comments 7: Experimental Validation (if not already included):

Response 7: We thank the reviewer for pointing out this comment. While there are few existing methods for spectral conversion with moving targets to directly cite as comparison, we add the following detail related to reconstruction quality with Root Mean Squared Error metric to the last paragraph of the Results section, page 10, lines 279-288:

“More traditional methods of image conversion, without the band separation scheme outlined in this work, yield converted images where the presence of moving target in each band blurs over all output bands. In those cases, target reconstruction results in a single average effective position instead of varying per band. While this single effective position may be close to the positions of temporally internal bands (such as bands 4 or 5) other bands (such as 1 or 2) will contribute to large errors. Specifically, this single effective position yields Root Mean Squared Error (RMSE) of 1.8 and 2.7 pixels in the X and Y coordinates, respectively. In contrast, the PIGESBCCN converted target with band localized positions yields RMSE of only 0.49 and 0.41 pixels for the X and Y coordinates, respectively.”

Reviewer 2 Report

Comments and Suggestions for Authors

In this manuscript, the authors proposed a method named PIGESBCNN to resolve the problem of spuriously blurred images in the converted images. In general, the manuscript was organized poorly, and the method was not stated well. Scientific paper should be new and clear. There are several suggestions and comments for the manuscript:

The abstract and Introduction should be reorganized. Its logic is chaotic. The start half of the abstract is irrelevant to the research.
Line 58, Line 67, the format of the citation is wrong.
Line 75-150, What is “Physics-Informed”? Only Gaussian process was used in the proposed method. The title of the proposed method was unprecise and exaggerated.
Figure 1, Figure 6, Figure 7 lack of subgraph number.
You said the problem was important for satellite imagery. But the whole manuscript just contained experiments using simulation data. No real satellite images could be used to conduct experiments? If so, it should be not important.

Author Response

Comments 1: In this manuscript, the authors proposed a method named PIGESBCNN to resolve the problem of spuriously blurred images in the converted images. In general, the manuscript was organized poorly, and the method was not stated well. Scientific paper should be new and clear. There are several suggestions and comments for the manuscript:

Response 1: We thank the reviewer for their feedback and for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions highlighted in the re-submitted files.

Comments 2: The abstract and Introduction should be reorganized. Its logic is chaotic. The start half of the abstract is irrelevant to the research.

Response 2: We omit the start half of the Abstract and reorganize the language to focus on the following chain of logic: a) cause of spectral distortions (i.e. sequential capture), b) the consequences (i.e. misalignment for moving objects), and c) proposed solution and how it addresses the identified issues (i.e. model incorporating band reordering and branched model architecture). The new Abstract on page 1 is as follows, with specific additions for clarity in lines 14-15, 16-17, and 20:

“Integrating diverse image datasets acquired from different satellites is challenging. Converting images from one sensor to another, like from WorldView-3 (WV) to SuperDove (SD), involves both changing image channel wavelengths and per-band intensity scales because different sensors can acquire imagery of the same scene at different wavelengths and intensities. A parametrized convolutional network approach has shown promise converting across sensor domains, but it introduces distortion artefacts when objects are in motion. The cause of spectral distortion is due to temporal delays between sequential multispectral band acquisitions. This can result in spuriously blurred images of moving objects in the converted imagery, and consequently misaligned moving object locations across image bands. To resolve this, we propose an enhanced model, the Physics-Informed Gaussian-Enforced Separated-Band Convolutional Conversion Network (PIGESBCCN), which better accounts for known spatial, spectral, and temporal correlations between bands via band reordering and branched model architecture.”

We additionally amend the Introduction to improve the chain of logic in lines 58-59 and 70-72, on the usage of simulated data in this work.

Comments 3: Line 58, Line 67, the format of the citation is wrong.

Response 3: We thank the reviewer for their careful inspection and check the formatting of the identified citations (6, 7, and 8) as follows, with changes highlighted:

“[6] R. L. Sundberg, J. Gruninger, M. Nosek, J. Burks and E. Fontaine, "Quick image display (QUID) model for rapid real-time target imagery and spectral signatures," Technologies for Synthetic Environments: Hardware-in-the-Loop Testing II. SPIE, vol. 3084, pp. 272-281, 1997. “

This is confirmed as correct.

“[7] R. Sundberg, S. Adler-Golden, T. Perkins and K. Vongsy, "Detection of spectrally varying BRDF materials in hyperspectral reflectance imagery," 2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), pp. 1-4, 2015.”

“[7] R. Sundberg, S. Adler-Golden, T. Perkins and K. Vongsy, "Detection of spectrally varying BRDF materials in hyperspectral reflectance imagery," 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), pp. 1-4, 2015.“

We omit the repeated “2015” year.

“[8] T. Perkins, R. Sundberg, J. Cordell, Z. Tun and M. Owen, "Real-time target motion animation for missile warning system testing," Proc. SPIE, vol. 6208, pp. 1-12, 2006.”

“[8] T. Perkins, R. Sundberg, J. Cordell, Z. Tun and M. Owen, "Real-time target motion animation for missile warning system testing," Technologies for Synthetic Environments: Hardware-in-the-Loop Testing XI. SPIE, vol. 6208, pp. 1-12, 2006.”

We change to the proper title of the SPIE proceedings, to be consistent with the format of the other citations.

Comments 4: Line 75-150, What is “Physics-Informed”? Only Gaussian process was used in the proposed method. The title of the proposed method was unprecise and exaggerated.

Response 4: We thank the reviewer for this key question and amend the manuscript to explicitly identify where and how physics-based knowledge or data are incorporated into the proposed method.

Firstly, it is known that the WV sensor and the SD sensor have different per-band relative blurring factors. Thus, we incorporate specific gaussian blurs scaled by factors of 4.2576, 4.2679, 4.2670, 4.2498, 4.2841, 4.4390, 4.2026, and 4.3625 for the 8 image channels respectively. We incorporate that physical knowledge to inform the model, instead of just hoping the ML model parameters learn the correct relative blurring. Next, we know that the WV sensor temporally acquires the spectral indices in the order of band 1, 8, 7, 5, 4, 3, 6, then 2. Thus, to enforce this known temporal relationship into the model architecture, we rearrange the bands before further convolutional processing. This rearrangement informs the model of the correct physical time progression, instead of just hoping the ML model parameters learn the correct band reordering. Furthermore, we know the correct output has band localized object positions. This physical knowledge informs our design of the model architecture, as separate convolutional branches per output image channel, instead of just hoping the ML model parameters learn correct band separation. In these ways, our knowledge of the physical system informs our methodology, solidifies known relationships between the data, and ameliorates spurious blur, band reordering, and band mixing errors that may otherwise manifest with a more naïve non-physics-informed approach.

This explanation has been added to the Results section prior to Figure 2 illustrating the model architecture, on page 5, lines 170-177, 180-183, and 186-190.

Comments 5: Figure 1, Figure 6, Figure 7 lack of subgraph number.

Response 5: We thank the reviewer for this comment and subgraph indicators have now been added for Figures 1, 6, and 7.

Comments 6: You said the problem was important for satellite imagery. But the whole manuscript just contained experiments using simulation data. No real satellite images could be used to conduct experiments? If so, it should be not important.

Response 6: We thank the reviewer for indicating the scope of this work on simulation data and clarify our focus on simulation in the Introduction, lines 58-59 and 70-72.

We agree that treatment of moving object blurring might not be important for general uses in, for example, agricultural monitoring or segmentation applications over large time windows. However, correct treatment of moving objects is important for select applications in, for example, small target detection, and we look forward to handling such experimental data in future work.

This discussion on future work has been added to the last paragraph of the Discussion section, page 11, lines 316-320.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The revised version is OK.