Review Reports - Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a novel approach using image processing from a chaser spacecraft to detect structural markers on the ENVISAT satellite, and an estimation framework utilizing Unscented Kalman Filtering (UKF) to facilitate its safe de-orbiting. Additionally, Validate the effectiveness of the method through simulation.

The overall structure of the paper is reasonable, but there are also some issues. Here are some suggestions for the paper:

The introduction section of this paper heavily emphasizes the CNN significance, but lacks analysis of the research status in this field. This makes it difficult for readers to effectively evaluate the paper's level in this field.
The mathematical description of converting from 2D to 3D in the article is unclear, making it difficult to understand the analytical process.
This paper primarily addresses the static processes of spacecraft. However, if the spacecraft itself is in motion, the impact on the relative pose measurement becomes significantly pronounced. It is crucial to enhance the discussion on this aspect to provide a more comprehensive analysis.
This paper focuses on the application of CNNs without addressing the improvements it offers over other related studies. It is recommended to expand the simulation section by incorporating this aspect to highlight the advancements and comparative benefits.
The conclusion is overly verbose and could benefit from being more concise.

Author Response

Please see the attachment.

In the attached document, you can find our answers to your comments. Thank you so much for your valuable comments and insight about our work.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper “Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection” is devoted to algorithm development and its performance investigation for the specified case study. The paper is well-written and it will be of interest for specialists in motion determination. However, several of essential aspects should be clarified by the authors before the publication.

The main contribution of the paper as it is stated by the authors is “the unique approach in which the process noise covariance matrix is adjusted, reflecting the increased uncertainty in state predictions during the eclipse”. But this approach is not clearly described in the text, more clarifications are required. It is confusing that the adjustments of the process noise covariance matrix Q is used to enlarge the uncertainties in dynamical model during the absence of the measurements. It seems that the lack of measurements cannot influence on the process noise of the system. It is more confusing that the authors considers no any disturbances and uncertainties in the motion equations as stated in line 130. What is the meaning of the process noise covariance matrix in this case?

It is suggested to specify the paper title by adding more keywords related to main contribution. Now the title is quite general for the area of relative pose estimation using monocular vision.

The considered rectangular body with 8 marker points considered as corners is symmetrical, which means the uncertainty in marker association stage in case of high initial attitude errors. Could the authors add more comments how to address this issue?

The value of the measurement noise covariance matrix R must be dependent on the relative distance between the chaser and the target. How the value of 0.6² wasselected (line 305)? And later – the values of 0.02 and 0.2 in line 503?

Some of the images of rectangular body in Fig. 3, 9 and 10 are of a strange shape as if the body is not rectangular. Please, check that the projections are correct.

In Section 6 all the simulation parameters values should be provided. What are the inertia tensor values, the parameters of UKF (α, β, k), the size of the images, etc.

The adjustment of the values of the matrix Q is not demonstrated in the provided in Section 6 examples. The Table 3 shows the values of Q for the case of available measurements, but what are the adjustments in the eclipse? What is the methodology for such an adjustment?

Author Response

Please see the attachment.

In the attached document, you can find our answers to your comments. Thank you so much for your valuable comments and insight about our work.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Author does not mention a very large body of work prior to the deep learning enabled methods that uses perspective-n-point (PnP)¹, blob detection², circular regions³, or region based methods⁴. The author did not discuss why these methods were passed over compared to using CNN, the pros and cons in that comparison.
- 1 – Sharma, S. and D’Amico, S., “Comparative Assessment of Techniques for Initial Pose Estimation using Monocular Vision”, Acta Astronautica, 123 (2016) 435-445
- 2 – Zhang, G. et. al., “Cooperative Relative Navigation for Space Rendezvous and Proximity Operations using Controlled Active Vision”, Journal of Field Robotics, Vol 00, No 0, 1-24 (2015), DOI: 10.1002/rob.21575
- 3 – Liu, C. and Hu, W., “Relative Pose Estimation for Cylinder-Shaped Spacecrafts Using Single Image”, IEEE Trans. On Aerospace and Electronic Systems, Vol 50, No 4, pp.3036-3056
- 4 – Shi, J., and Ulrich S., “Uncooperative Spacecraft Pose Estimation Using Monocular Monochromatic Images”, Journal of Spacecraft and Rockets, Vol 58, No 2, pp.284-301
Line 31: provide reference for claiming CNN is more robust than feature-based methods
Section 2: provide an end-to-end pipeline diagram of your method
Sec 3.1, this section is vague in discussing the network used for this work, you are describing general Deep Learning theory that is not an efficient use of journal page allocation, you should provide the specifics of the network you’ve used, was it your own design? Was it modified from an existing network? What is the size of the various layers. What were the parameters chosen and why? Is Figure 4, is this the actual architecture? ConvNet layers and two MLP layers? What is the detection head? There needs to be sufficient detail for someone to reproduce this network model and your work.
Sec 5.2 line 358: you describe key points being detected by the network, however you do not provide how this is done in Sec 3.1. Alexnet is a recognition model. What is the layer head you’ve put on your agent to detect corner keypoints?
Sec 5.5: way would you need CNN to detect corners if this can be done simply with a Harris corner detection? Provide the rationale and data to show why a computationally more costly CNN is preferred over traditional corner detection methods.
Figure 10: it seem out of priority to me when the author focus on noise and blur errors when the model under test is a synthetic box with smooth surfaces. When you include the full body of Envisat with its solar panel features, body panel features, and shadow effects, this will be a much more difficult image to process for corner features before worrying about artificially introducing noise and blur.
Figure 11-13: provide legend for the different color lines, add grid lines
Recommend for future work to use a realistic vision model and lighting conditions of the Envisat, you may find the solution would be much harder to compute with realistic corner features images. As a journal paper, it is expected to have this work completed in a journal paper fidelity release.
This paper does not compare current state-of-the-art methods to the author proposed method. Rather, it uses techniques that are well established. I fail to see the novelty in this work compared to work done in the past.

Author Response

Please see the attachment.

In the attached document, you can find our answers to your comments. Thank you so much for your valuable comments and insight about our work.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed each of the review comments individually and have supplemented the paper with key experimental data and theoretical analysis. As a result, the revised paper has significantly improved in terms of academic rigor, methodological completeness, and conclusion reliability. Therefore, I recommend accepting this paper.

Author Response

Thank You so much for Your comments all along. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors addressed all the reviewer's comments, the paper can be accepted for the publication

Author Response

Thank you so much for your valuable comments all along. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

line 256 - provide reference for TinyCornerNET, or is this your invention? If so you should provide where you derived this idea from and provide reference, also include this invention as a contribution from this paper.

Figure 6 - this graph is difficult to understand. What is the image the test is completed on? Is it Figure 3? How many images was this test conducted on? What were the variation conditions? The probability for 6 detected markers exceeded 1. Do you mean to compare these bars side by side? I am surprised Harris corner failed to detect more than 4 markers. Normally keypoint accuracy comparison can be shown by precision vs recall as corners are not only detected, but also taking into account of the precision of the detected corner in pixel distance to the ground truth.

Author Response

Thank you so much for your valuable comments all along. Please see the attachment.

Author Response File: Author Response.pdf