Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach

Zhao, Mingyuan; Xu, Long

doi:10.3390/rs17111944

Open AccessArticle

Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach

by

Mingyuan Zhao

^1,2

and

Long Xu

^3,*

¹

National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Institute of Fundamental Physics and Quantum Technolog, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(11), 1944; https://doi.org/10.3390/rs17111944

Submission received: 24 March 2025 / Revised: 13 May 2025 / Accepted: 23 May 2025 / Published: 4 June 2025

Download

Browse Figures

Versions Notes

Abstract

This paper addresses the challenge of robust pose estimation for spacecraft under rapid inter-frame motion, proposing a two-stage point cloud registration framework. The first stage computes coarse pose estimation by leveraging Fast Point Feature Histogram (FPFH) descriptors with random sample and consensus (RANSAC) for correspondence matching, effectively handling significant positional displacements. The second stage refines the solution through geometry-aware fine registration using raw point cloud data, enhancing precision through a multi-scale iterative ICP-like framework. To validate the approach, we simulate time-of-flight (ToF) sensor measurements by rendering NASA’s public 3D spacecraft models and obtain 3D point clouds by back-projecting the depth measurements to 3D space. Comprehensive experiments demonstrate superior performance over several state-of-the-art methods in both accuracy and robustness under rapid inter-frame motion scenarios. The dual-stage architecture proves effective in maintaining tracking continuity while mitigating error accumulation from fast relative motion, showing promise for autonomous spacecraft proximity operations.

Keywords:

pose estimation; point cloud registration; RANSAC; ICP; time-of-flight

1. Introduction

In the vast expanse of space, the precise determination of a space target’s position and orientation—known as pose estimation—is a task of paramount importance. It is the cornerstone of numerous space missions, ranging from routine satellite operations to complex tasks such as autonomous docking, orbital maintenance, and planetary exploration [1,2,3]. Accurate pose estimation enables spacecraft to navigate safely, perform intricate maneuvers, and interact with other objects in space with the required precision. As space missions become increasingly ambitious, the demand for robust and reliable pose estimation techniques has never been greater. Despite the critical nature of this task, achieving accurate pose estimation in space is fraught with challenges. One of the most significant hurdles is the rapid inter-frame motion that spacecraft experience, particularly during aggressive orbital maneuvers or when operating in close proximity to other fast-moving objects [4,5].

Recent advancements in spacecraft pose estimation have explored diverse sensor configurations, including monocular [6], stereo [7], and time-of-flight (ToF) imaging systems [5]. Monocular vision approaches often rely on monocular or stereo vision systems combined with feature-based methods, such as template matching or 2D keypoint descriptors (e.g., SIFT [8], ORB [9]). While computationally efficient and effective in structured environments, they inherently suffer from depth ambiguity and struggle with rapid inter-frame motion, sparse textures, and extreme lighting variations common in space scenarios, requiring extensive prior knowledge of target geometry—a critical limitation when operating with uncooperative space objects lacking predefined models [10]. In contrast, depth-enabled modalities such as stereo vision and ToF sensors overcome this constraint by directly acquiring 3D structural measurements through point cloud generation [4,11,12,13]. Recent advances in 3D sensing have shifted focus to point cloud registration techniques. This capability transforms the pose estimation challenge into a temporal point cloud registration problem, where relative transformations between sequential frames can be solved through geometric correspondence alignment.

Iterative Closest Point (ICP) [14] and its variants (e.g., Generalized-ICP [15] and Robust-ICP [16]) are widely used for fine pose refinement but require accurate initial guesses, making them prone to failure under large displacements. To address this, hybrid frameworks combining global and local registration have emerged. For instance, feature descriptors like FPFH [17] and SHOT [18], paired with RANSAC [19], are employed for coarse alignment. More recently, deep learning techniques have been introduced to learn 3D local descriptors. 3DMatch [20] and 3DSmoothNet [21] pioneered this approach by employing a Siamese Network for feature extraction. FCGF [22] eliminates the need for keypoint detection by computing descriptors in a single pass using a fully convolutional neural network. Predator [23] leverages an attention mechanism to detect salient points in overlapping regions, enhancing robustness in low-overlap scenarios. GeoTransformer [24] learns geometric features for superpoint matching, demonstrating strong resilience to low overlap and rigid transformations. However, these transformer-based methods present critical limitations for spacecraft applications. First, their reliance on large-scale annotated training data conflicts with the scarcity of spacecraft point cloud datasets, which are inherently limited by mission-specific configurations and the prohibitive costs of space environment simulations. Second, the computational complexity of transformer operations—particularly cross-attention mechanisms—poses challenges for real-time execution on radiation-hardened spacecraft processors with stringent power constraints. These dual limitations of data dependency and computational intensity motivated our adoption of FPFH for coarse registration. As a handcrafted descriptor, FPFH operates in a training-free manner with low computational complexity, ensuring three key advantages: (1) immediate deployment without data collection or model fine-tuning, (2) deterministic runtime performance on resource-constrained hardware, and (3) inherent robustness to the sparse and irregular point distributions typical in spacecraft scenarios. Notably, while modern descriptors like FCGF and GeoTransformer reduce outlier proportions, no existing method can fully eliminate erroneous correspondences, particularly under extreme noise or partial observations. This underscores the necessity for robust estimators. Random Sample Consensus (RANSAC) [19] is a widely used algorithm for robust estimation in the presence of outliers. It iteratively samples minimal subsets of correspondences to generate pose hypotheses, which are then evaluated against the full correspondence set. However, its efficiency deteriorates in high-outlier scenarios, requiring exponentially more iterations to achieve reliable results. To address this, various RANSAC variants have been proposed to enhance efficiency and robustness. GC-RANSAC [25] improves local optimization by alternating between graph cuts and model refitting. CG-SAC [26] leverages keypoint normals and a compatibility-guided strategy to reduce sampling randomness. SAC-COT [27] ranks and samples ternary loops from the compatibility graph for more precise estimation. SC2-PCR [28] introduces a second-order spatial compatibility (SC2) metric, which emphasizes global rather than local consistency, allowing for more effective inlier identification and robust registration. Beyond RANSAC-based approaches, Branch-and-Bound (BnB) methods [29,30,31] explore the 6D parameter space to find globally optimal solutions. While these methods ensure optimality, they suffer from high computational costs, particularly in large correspondence sets with high outlier ratios. Another line of point cloud registration consists of optimization-based methods. Optimization-based methods mitigate the impact of outliers by employing robust loss functions that downweight outlier contributions during iterative estimation. Fast Global Registration (FGR) [32] formulates a dense objective using a scaled Geman–McClure estimator as the penalty function and reduces the influence of local minima through Graduated Nonconvexity (GNC). Yang et al. [31] extended GNC to operate directly with standard nonminimal solvers, creating a versatile framework for spatial perception tasks such as object localization and PCR. However, both FGR and GNC struggle with local minima when dealing with extremely low inlier ratios or in the absence of a good initial guess.

In the context of spacecraft-specific applications, model-based approaches leveraging prior CAD data have gained traction. Ref. [33] integrated ICP with Kalman filtering to track pose under orbital motion, yet their method suffered from error accumulation during high-speed maneuvers. Deep learning techniques, such as PointNet-based architectures [34], have also been explored for direct pose regression. However, these methods demand extensive training data and struggle to generalize to unseen spacecraft geometries or dynamic lighting. To simulate realistic on-orbit conditions, synthetic datasets generated via rendering tools (e.g., Blender, Unity) have become pivotal. Pioneering work by [35] utilized NASA’s 3D models to train vision-based detectors, but their focus was limited to RGB imagery, ignoring depth sensor challenges like noise and sparsity. Recent studies introduced physics-based ToF sensor simulations [4], enabling evaluation of registration algorithms under varying noise levels and motion blur.

Motivated by this paradigm, our work specifically addresses the precision enhancement of point cloud-based pose determination under challenging orbital relative motion conditions. Instead of applying robust loss functions directly in global registration, we introduce a two-stage refinement strategy. The first stage of our framework focuses on computing a coarse pose estimation by leveraging Fast Point Feature Histogram (FPFH) descriptors [17] in conjunction with RANSAC-based correspondence matching [19]. This combination is particularly effective in handling significant positional displacements, making it well-suited for scenarios involving rapid inter-frame motion. The use of FPFH descriptors allows for the efficient computation of geometric features, while RANSAC helps to mitigate the impact of outliers, ensuring a more reliable initial pose estimation. In the second stage, we refine the coarse pose estimation through a geometry-aware fine registration process. This stage utilizes raw point cloud data and employs an ICP-like framework. By incorporating the coarse pose, we can generate more accurate correspondences for more accurate pose, which is crucial for tracking the spacecraft’s motion over time. This refinement process significantly enhances the precision of the pose estimation, ensuring that the final result is both accurate and reliable. The proposed two-stage framework builds on these foundations by uniquely combining FPFH-RANSAC for robustness to large displacements and motion-constrained ICP for precision. Unlike prior works, the integration of temporal continuity constraints directly addresses error propagation in high-speed scenarios, a gap highlighted in recent benchmarks for autonomous spacecraft navigation. This dual strategy bridges the trade-off between computational efficiency and accuracy, advancing the state of the art for dynamic on-orbit applications.

To validate the effectiveness of our proposed framework, we have developed a comprehensive dataset that simulates time-of-flight (ToF) sensor measurements. This dataset is generated by rendering NASA’s public 3D spacecraft models, followed by back-projection to create synthetic 3D point clouds. The diversity of this dataset allows us to thoroughly test and evaluate our method under different scenarios, ensuring its robustness and versatility. An illustration of the simulated ToF image sequence is presented in Figure 1. Our experiments demonstrate that the proposed two-stage registration framework outperforms conventional methods in both accuracy and robustness. Specifically, our approach achieves an angular error of less than 2.0° and a translational error of less than 10 cm, showcasing its superior performance in dynamic environments. The dual-stage architecture not only maintains tracking continuity but also effectively mitigates error accumulation from fast relative motion, making it a promising solution for autonomous spacecraft proximity operations. The primary novelty of our work lies in the design of a two-stage point cloud registration framework that builds upon conventional ICP-based fine registration. In the coarse registration stage, we introduce a feature-based alignment mechanism enhanced with a length-invariant outlier rejection strategy, combined with a RANSAC-based pose estimation scheme. This allows us to robustly compute an initial pose even in the presence of significant noise and outliers. The resulting coarse pose initialization significantly reduces the likelihood of local minima during the subsequent fine registration stage, thereby improving convergence and stability. This design proves particularly effective in scenarios involving rapid inter-frame motion, which are common in spacecraft operations. In conclusion, the development of our two-stage point cloud registration framework represents a significant advancement in the field of spacecraft pose estimation. By addressing the challenges posed by rapid inter-frame motion and providing a more accurate and robust solution, our work paves the way for safer and more efficient space missions. Additionally, the comprehensive dataset we have contributed will serve as a valuable resource for further research and development in this area.

The main contributions of this work are threefold:

Two-stage Point Cloud Registration Framework: We propose a novel two-stage point cloud registration framework tailored for pose estimation of noncooperative spacecraft. This approach incorporates a length-invariant outlier rejection mechanism in the coarse alignment stage, followed by an ICP-based fine registration. The design significantly improves robustness and accuracy under fast inter-frame motion, as demonstrated through extensive quantitative evaluations.
Synthetic Benchmark Dataset for Spacecraft Pose Estimation: We construct a comprehensive synthetic dataset using 8 diverse CAD models of spacecraft, each with 10 independently generated sequences and a total of 12,000 annotated frames. The dataset provides accurate ground truth poses and is designed to support rigorous benchmarking under a wide range of motion and viewing conditions.
Practical Adaptability for Onboard Applications: The proposed framework is designed with computational efficiency and data generalizability in mind, making it suitable for onboard processing in resource-constrained spacecraft systems. Unlike many deep learning-based methods, our approach avoids the need for large-scale training data and delivers real-time inference performance, which is validated across diverse simulated ToF scenarios.

The remainder of this paper is organized as follows. Section 2 details our novel two-stage point cloud registration methodology, presenting both the coarse alignment mechanism for initial pose approximation and the refined fine registration stage for precision tracking. Section 3 validates the proposed approach through extensive experiments on our newly created benchmark dataset, featuring quantitative comparisons with state-of-the-art methods across multiple evaluation metrics. Finally, Section 5 concludes the paper by summarizing key findings, discussing practical implications for on-orbit servicing missions, and outlining directions for future research in spaceborne perception systems.

2. Materials and Methods

Spacecrafts in close-range proximity often exhibit abrupt rotational/translational displacements between consecutive frames. This violates the “small-motion assumption” required by traditional iterative optimization methods (e.g., ICP and its variants), causing severe convergence issues. Conventional registration pipelines relying solely on fine registration are highly susceptible to local minima when initial pose errors exceed their narrow convergence basins. This is exacerbated by sparse geometries and sensor noise in space environments. To address these challenges, firstly, our method introduces a robust coarse registration stage leveraging efficient geometric descriptors for correspondence matching, followed by outlier rejection and robust parameter estimation via RANSAC. This provides a globally consistent initial pose estimate Secondly, a fine registration stage is initialized with the coarse alignment result and then performs multi-scale iterative fine optimization. This hierarchical architecture explicitly decouples large-motion estimation from precision refinement, ensuring both convergence reliability and high accuracy.

2.1. Problem Statement

Given a sequential point cloud observation of a noncooperative spacecraft

{P_{t}}_{t = 0}^{T}

where

P_{t} = {p_{t}^{i} \in R^{3}}_{i = 1}^{N_{t}}

represents the 3D point cloud captured at time step t, we aim to continuously estimate the spacecraft’s 6-DOF pose

T_{t} \in S E (3)

through point cloud registration. The pose transformation between consecutive frames should satisfy:

T_{t} = arg min_{T \in S E (3)} \sum_{i = 1}^{N_{t}} {∥T \circ p_{t}^{i} - q_{t}^{i}∥}^{2}

(1)

where

T_{t} = [\begin{matrix} R_{t} & t_{t} \\ 0 & 1 \end{matrix}]

denotes the homogeneous transformation matrix comprising rotation

R_{t} \in S O (3)

and translation

t_{t} \in R^{3}

and

q_{t}^{i}

represents the corresponding point in the previous frame

P_{t - 1}

. The operator ∘ applies rigid transformation to points. An illustration of the pipeline of our proposed method is presented in Figure 2.

2.2. Feature-Based Coarse Registration

2.2.1. Point Cloud Preprocessing

Point cloud normals are necessary and essential for feature extraction and descriptor generation; we compute them via Principal Component Analysis (PCA) on local neighborhoods. For each point

p_{i} \in P

, its neighborhood is defined as

N_{i} = {p_{j} | ∥ p_{j} - p_{i} ∥_{2} < r}

, and then we can compute the covariance matrix as

C_{i} = \frac{1}{| N_{i} |} \sum_{j \in N_{i}} (p_{j} - μ_{i}) {(p_{j} - μ_{i})}^{⊤}

(2)

where

μ_{i} = \frac{1}{| N_{i} |} \sum_{j \in N_{i}} p_{j}

. Eigen decompostion is applied to

C_{i}

:

C_{i} v_{k} = λ_{k} v_{k}, λ_{0} \leq λ_{1} \leq λ_{2}

, and finally, the normal vector is selected as

n_{i} = v_{0}

, which is the eigenvector for the smallest eigenvalue.

Since the raw point cloud is too dense for feature extraction, for efficiency, we use voxel downsampling to downsample the original point clouds. On the other hand, voxel downsampling outputs more smooth point clouds. The point cloud is discretized into cubic voxels with edge length s: the voxel grid is a collection of

V = ⋃_{m, n, l} V_{m, n, l}

, then the voxel coordinates is computed as

V_{m, n, l} = \{p_{i} | m = ⌊ \frac{x_{i}}{s} ⌋, n = ⌊ \frac{y_{i}}{s} ⌋, l = ⌊ \frac{z_{i}}{s} ⌋\} .

(3)

Finally, the downsampled point is computed as

{\hat{p}}_{m, n, l} = \frac{1}{| V_{m, n, l} |} \sum_{p_{i} \in V_{m, n, l}} p_{i} .

(4)

Then, noise suppression is applied via k-nearest neighbor statistics: the mean distance is computed as

{\bar{d}}_{i} = \frac{1}{k} \sum_{j = 1}^{k} {∥ p_{i} - p_{j}^{(k)} ∥}_{2} .

(5)

The thresholding is determined by

P_{filtered} = \{p_{i} |{\bar{d}}_{i} \leq μ_{d} + α σ_{d}\}

, where

μ_{d} = \frac{1}{N} \sum_{i = 1}^{N} {\bar{d}}_{i}, σ_{d} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\bar{d}}_{i} - μ_{d})}^{2}} .

2.2.2. Feature Extraction and Matching

The FPFH [17] is widely adopted for 3D feature extraction and matching. We briefly review the computation of FPFH as follows. First, the Simplified Point Feature Histogram (SPFH) is computed. For each point

p_{i}

with normal

n_{i}

and neighbors

p_{j} \in N_{i}

:

\begin{matrix} α_{j} & = arctan (n_{j}^{⊤} \frac{p_{j} - p_{i}}{∥ p_{j} - p_{i} ∥}) \\ ϕ_{j} & = u_{i}^{⊤} \frac{p_{j} - p_{i}}{d} \\ θ_{j} & = arctan (n_{i}^{⊤} (w_{j} \times u_{i}), n_{i}^{⊤} w_{j}) \end{matrix}

(6)

where:

d = ∥ p_{j} - p_{i} ∥_{2}, u_{i} = \frac{p_{j} - p_{i}}{d}, w_{j} = n_{j} \times u_{i}

(7)

Then, FPFH is constructed as

FPFH (p_{i}) = SPFH (p_{i}) + \frac{1}{| N_{i} |} \sum_{j \in N_{i}} \frac{1}{d_{j}} SPFH (p_{j})

(8)

with distance weighting

d_{j} = {∥ p_{j} - p_{i} ∥}_{2}

.

Once we get the FPFH descriptor, which is a 33-dimensional vector, we perform robust feature matching. Several matching criteria are adopted for more robust feature matching. First, the ratio test is performed. For two nearest neighbors

q_{1}, q_{2}

:

Match valid if : \frac{D_{f} (f_{p}, f_{q_{1}})}{D_{f} (f_{p}, f_{q_{2}})} < τ

(9)

where

D_{f} (f_{a}, f_{b}) = {∥ f_{a} - f_{b} ∥}_{2}

and

τ \in [0.8, 0.95]

. In addition to the ratio test, we propose another criterion. Define k-nearest neighborhoods:

\begin{matrix} N_{k} (p) & = {q_{1}, \dots, q_{k} | D_{f} (f_{p}, f_{q_{i}}) sorted} \\ N_{k} (q) & = {p_{1}, \dots, p_{k} | D_{f} (f_{q}, f_{p_{j}}) sorted} \end{matrix}

(10)

The final correspondence set is constructed as:

C = {(p, q) | q \in N_{k} (p) \land p \in N_{k} (q)}

(11)

2.2.3. RANSAC-Based Pose Estimation

Given initial correspondences

C = {(p_{i}, q_{i})}_{i = 1}^{M}

, we introduce pairwise length invariance for outlier rejection. For two correspondences

(p_{a}, q_{a})

and

(p_{b}, q_{b})

:

Δ l_{a b} = |∥ p_{a} - p_{b} ∥_{2} - {∥ q_{a} - q_{b} ∥}_{2}|

(12)

the distance is computed via robust kernel:

s_{a b} = max (0, 1 - \frac{Δ l_{a b}^{2}}{σ_{l}^{2}})

(13)

where

σ_{l}

controls geometric tolerance (typically 0.1 times median edge length). The voting score is aggregated next. For N random pairs, only top-k matches survive and the filtered correspondence set is constructed as follows:

Score (c_{i}) = \sum_{j = 1}^{N} s_{i j} \cdot I (Δ l_{i j} < 3 σ_{l})

(14)

C_{filtered} = {c_{i} | Score (c_{i}) \in Top - K ({Score (c_{j})}_{j = 1}^{M})}

(15)

This outlier rejection mechanism effectively improves the quality of point correspondences, which enhances the accuracy and robustness of the subsequent fine registration. By filtering out inconsistent matches based on length invariance, the estimated coarse pose is more reliable, providing a well-initialized alignment that prevents the optimization from falling into local minima. This is particularly beneficial in scenarios with large inter-frame motion, as demonstrated in the experimental section. After outlier rejection, RANSAC is applied for pose estimation. Randomly sample 3 correspondences and solve:

\begin{matrix} min_{R, t} \sum_{k = 1}^{3} {∥ R p_{k} + t - q_{k} ∥}^{2} \\ s . t . R \in S O (3), t \in R^{3} \end{matrix}

(16)

where this equation has close form solution via SVD decomposition. For model verification, compute inlier count:

IR = \sum_{(p_{i}, q_{i})} I (∥ R p_{i} + t - q_{i} ∥_{2} < ϵ_{th})

(17)

Finally, we refine the pose with all survived inliers, which is repeated for for K iterations (typically 10–20). The best transformation is updated as follows:

T^{*} = arg min_{T \in S E (3)} \sum_{(p_{i}, q_{i}) \in C_{inlier}} {∥ T \circ p_{i} - q_{i} ∥}^{2} .

(18)

2.3. Geometry-Based Fine Registration

After obtaining a coarse pose estimation from RANSAC-based alignment, there is still room for refinement due to inaccuracies in feature positions. These inaccuracies can arise from sensor noise, imperfect feature extraction, or outlier correspondences. To address this, we refine the pose by leveraging the raw point cloud geometry through an iterative process that involves regenerating correspondences and solving for an improved transformation using a weighted SVD approach. Given a coarse transformation

T_{c} = [R_{c} | t_{c}]

, we apply it to the source point cloud

P = {p_{i}}_{i = 1}^{N}

to obtain transformed points:

p_{i}^{'} = R_{c} p_{i} + t_{c} .

(19)

For each transformed point

p_{i}^{'}

, we search for the nearest neighbor

q_{j}

in the target point cloud

Q = {q_{j}}_{j = 1}^{M}

using Euclidean distance:

q_{j} = arg min_{q_{j} \in Q} ∥ p_{i}^{'} - q_{j} ∥ .

(20)

This process generates a new set of correspondences

{(p_{i}, q_{j})}

, which better reflect the true alignment. Once correspondences are established, we refine the transformation by solving a weighted least-squares optimization problem. The objective is to minimize:

\sum_{i} w_{i} {∥ R_{p} p_{i} + t - q_{i} ∥}^{2},

(21)

where

w_{i}

is a robust weight determined using a kernel function of the point-wise residuals:

w_{i} = ϕ (∥ R_{c} p_{i} + t_{c} - q_{i} ∥),

(22)

and

ϕ (\cdot)

is a robust Cauchy loss to mitigate the influence of outliers. Then, we compute the weighted centroid of both point sets:

\bar{p} = \frac{\sum_{i} w_{i} p_{i}}{\sum_{i} w_{i}}, \bar{q} = \frac{\sum_{i} w_{i} q_{i}}{\sum_{i} w_{i}} .

(23)

The covariance matrix is given by:

H = \sum_{i} w_{i} (p_{i} - \bar{p}) {(q_{i} - \bar{q})}^{T} .

(24)

Using singular value decomposition (SVD)

H = U Σ V^{T}

, the optimal rotation is obtained as

R = V U^{T}

. If

det (R) < 0

, we enforce a proper rotation by flipping the last column of

V

as

V [:, 3] \leftarrow - V [:, 3]

. The optimal translation is then given by

t = \bar{q} - R \bar{p}

. To accelerate convergence and improve robustness, we implement a multi-scale framework: (1) start with a downsampled coarse point cloud to quickly establish an initial refinement; (2) iteratively refine the pose at increasing resolutions; (3) at each scale, update correspondences and recompute the transformation; (4) stop when convergence criteria (e.g., transformation change below a threshold) are met.

2.4. Point Cloud Dataset Construction

Let the spacecraft CAD model be represented as triangular mesh

M = {v_{i} \in R^{3}}_{i = 1}^{N}

with surface normals

{n_{i}}

. The perspective projection follows:

[\begin{matrix} u \\ v s . \\ 1 \end{matrix}] = \frac{1}{z_{c}} K [R | t] [\begin{matrix} v_{w} \\ 1 \end{matrix}],

(25)

where:

$v_{w} \in R^{3}$ : 3D vertex in world coordinates
$R \in S O (3)$ , $t \in R^{3}$ : Camera extrinsic parameters
$K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}]$ : Intrinsic matrix
$(u, v)$ : Pixel coordinates in image plane

For each pixel coordinate

(u, v)

, its normalized coordinate is given by:

(x_{n}, y_{n}, 1) = (\frac{u - c_{x}}{f_{x}}, \frac{v - c_{y}}{f_{y}}, 1) .

(26)

Then, its corresponding 3D point in the camera coordinate is computed by scaling the normalized coordinate with depth value

d_{u v}

, which is given by:

v_{c a m} = (d_{u v} \cdot \frac{u - c_{x}}{f_{x}}, d_{u v} \cdot \frac{v - c_{y}}{f_{y}}, d_{u v}) .

(27)

This formula can be rewritten in the matrix form as follows:

v_{c a m} = d_{u v} \cdot K^{- 1} [\begin{matrix} u \\ v s . \\ 1 \end{matrix}], K^{- 1} = [\begin{matrix} \frac{1}{f_{x}} & 0 & - \frac{c_{x}}{f_{x}} \\ 0 & \frac{1}{f_{y}} & - \frac{c_{y}}{f_{y}} \\ 0 & 0 & 1 \end{matrix}] .

(28)

For each pixel coordinate

(u, v)

, cast a ray from camera origin

o

through the pixel:

r (t) = o + t \cdot d_{u v}, t \geq 0,

(29)

where the ray direction

d_{u v}

is computed as:

d_{u v} = R^{⊤} K^{- 1} [\begin{matrix} u \\ v \\ 1 \end{matrix}] .

(30)

The depth value

d_{u v}

is obtained by solving the ray-mesh intersection:

d_{u v} = min {t > 0 | r (t) \cap M \neq \emptyset} .

(31)

For each valid pixel

(u, v)

with depth

d_{u v}

, the 3D point

p_{w}

in world coordinates is obtained by converting

v_{c a m}

with

[R | t]

:

p_{w} = R^{⊤} (v_{c a m} - t) .

(32)

where

v_{c a m}

is computed by back-projecting the depth pixel with Equation (28). To simulate orbital motion, we render sequences with controlled viewpoint transitions. The process for synthetic data generation is summarized in Algorithm 1.

Algorithm 1 Synthetic point cloud generation.

1:: Initialize camera trajectory ${T_{k}}_{k = 1}^{N}$ with orbital dynamics
2:: for each frame k do
3:: Ray cast CAD model $M$ at pose $T_{k}$ → Get depth map $D_{k}$
4:: Add noise: ${\tilde{D}}_{k} = D_{k} + ϵ_{ToF}$
5:: Back-project: $P_{k} = {p_{i} | p_{i} = Equation (32) ({\tilde{d}}_{i})}$
6:: Apply sensor artifacts:

$P_{k} \leftarrow P_{k} ∖ {p | ∠ (n_{p}, v_{ray}) > θ_{\max}}$

(33)
7:: end for

3. Results

To comprehensively evaluate the proposed two-stage point cloud registration framework, we conducted systematic experiments under varying motion dynamics and sensing conditions.

3.1. Dataset

This study utilized 3D models of eight NASA spacecraft obtained from official NASA repositories: Aura (Earth Observing System spacecraft built by Northrop Grumman, Redondo Beach, CA, USA); Chandra X-ray Observatory (space telescope built by Northrop Grumman, Redondo Beach, CA, USA, operated by Smithsonian Astrophysical Observatory, Cambridge, MA, USA); Deep Space 1 (technology demonstrator spacecraft built by Jet Propulsion Laboratory, Pasadena, CA, USA); ICESat-2 (Ice, Cloud and land Elevation Satellite-2 with ATLAS instrument developed by NASA Goddard Space Flight Center, Greenbelt, MD, USA; spacecraft bus by Northrop Grumman, Gilbert, AZ, USA); Jason-1 (oceanography satellite built by Thales Alenia Space, Cannes, France for CNES, and Jet Propulsion Laboratory, Pasadena, CA, USA for NASA); Juno (Jupiter orbiter built by Lockheed Martin Space, Denver, CO, USA, operated by Jet Propulsion Laboratory, Pasadena, CA, USA); MESSENGER (Mercury orbiter built by Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA); TOPEX/Poseidon (oceanography satellite built by Fairchild Industries, Germantown, MD, USA with NASA instruments by Jet Propulsion Laboratory, Pasadena, CA, USA). All digital models were sourced from NASA’s publicly available 3D model repositories [36], covering diverse configurations including satellites, orbital transfer vehicles, and exploration probes. Each model is converted into watertight meshes with triangle counts ranging from 50 K to 200 K faces. Each model has 10 sequences, where each sequence contained 150 frames rendered with the algorithm in Section 2.4. Four examples of the simulated normal maps of various spacecraft are illustrated in Figure 3. For pose trajectory simulation, the initial transformation is randomly sampled from 6D pose space and the incremental Euler angle and translation along each axis are randomly increased within

[0 . 5^{\circ}, 2 . 0^{\circ}]

and [−50 cm, 50 cm] for subsequent frames, generating challenging trajectories with sudden maneuvers and introducing discontinuous jumps in angular and positional velocity.

3.2. Evaluation Metrics

To quantitatively assess the performance of feature- and geometry-based point cloud registration algorithms, we adopt three accuracy metrics: Relative Rotation Error (RRE) computes the angular deviation between estimated

{\hat{R}}_{n}

and ground truth

R_{n, g t}

rotations:

RRE = \frac{1}{N_{t o t a l}} \sum_{n = 1}^{N_{t o t a l}} \frac{180}{π} arccos (\frac{tr ({\hat{R}}_{n}^{⊤} R_{n, gt}) - 1}{2}),

(34)

where

t r (\cdot)

denotes the matrix trace. Relative Translation Error (RTE) evaluates Euclidean distance between estimated

t_{n}

and ground truth

t_{n, g t}

translations:

RTE = \frac{1}{N_{t o t a l}} \sum_{n = 1}^{N_{t o t a l}} {∥ t_{n} - t_{n, gt} ∥}_{2},

(35)

where

{∥ \cdot ∥}_{2}

represents the Euclidean norm and

N_{t o t a l} = 12, 000

is the total number of all tested frames. The transformation Root Mean Squared Error (RMSE) measures the Euclidean distance error between the transformed source point cloud and target point cloud:

RMSE = \sqrt{\frac{1}{| C^{*} |} \sum_{(p_{x_{i}}^{*}, q_{y_{i}}^{*}) \in C^{*}} {∥T \circ p_{x_{i}}^{*} - q_{y_{i}}^{*}∥}_{2}^{2}},

(36)

where

T

is the predicted transformation and

C^{*}

are the ground truth correspondences.

3.3. Implementation Details

All experiments are carried out on an Intel i9-9900K CPU at 3.7 GHZ. We use C++ and PCL library to implement the proposed algorithm. The voxel size s is set to 2.5, 5.0, 7.5, and 10 cm, respectively, for experiments on efficiency and accuracy. All spacecraft models are scaled into a cube with 3 m edge length and placed at 3 m to 10 m in front of the virtual ToF camera. The intrinsic parameters of principal point for ToF imaging are the center of the image with a resolution of

512 \times 512

, and the focal length is set to 525.

3.4. Comparison to Other Methods

3.4.1. Quantitative Comparison

We compare our method with several state-of-the-art pose estimation methods, including four representative traditional methods ICP [14], NDT [37], LSG-CPD [38], RobustICP [16], and two recent learning-based methods, i.e., IDAM [39] and RPM-Net [40]. We use zero-mean Gaussian noise with standard deviation

σ

= 2 cm to simulate the real noise level. The experimental results, as summarized in Table 1, demonstrate the effectiveness of the proposed dual-stage framework compared to state-of-the-art registration methods across various spacecraft sequences. On average, our method achieves the lowest rotation error (RRE: 1.90∘) and translation error (RTE: 8.44 cm), outperforming traditional approaches like ICP (RRE: 11.48°, RTE: 36.52 cm) and NDT (RRE: 10.76°, RTE: 43.87 cm), as well as recent learning-based methods such as IDAM (RRE: 2.32°, RTE: 10.70 cm) and RPM-Net (RRE: 2.10°, RTE: 9.59 cm). Notably, our framework excels in scenarios with aggressive motion, as evidenced by its superior performance on the DeepSpace sequence (RRE: 1.66°, RTE: 9.13 cm) and Chandra sequence (RTE: 4.10 cm), where fast inter-frame displacements challenge conventional algorithms. While LSG-CPD and RPM-Net exhibit competitive accuracy in certain cases (e.g., Aura and Messenger), our method maintains consistent robustness across all sequences, particularly under large positional shifts. The geometry-aware refinement stage contributes significantly to error reduction, as shown by the lower RTE in critical cases like Topex (4.79 cm vs. LSG-CPD’s 7.54 and RobustICP’s 8.59 cm). These results validate the dual-stage architecture’s ability to balance coarse-to-fine registration, effectively mitigating error accumulation from rapid motion while ensuring high precision—a critical requirement for autonomous spacecraft operations.

3.4.2. Robustness to Noise

To evaluate our method’s resilience to sensor imperfections, we inject zero-mean Gaussian noise with standard deviations ranging from

σ = 1

to 5 cm (relative to original point cloud coordinates) into the test sequences. As shown in Table 2, the proposed method demonstrates remarkable robustness against varying levels of sensor noise, outperforming both traditional and deep learned registration techniques. Across all noise levels (

σ = 1

to 5 cm), our approach achieves the lowest average rotation error (RRE: 2.23°) and translation error (RTE: 9.43 cm), significantly surpassing traditional methods like ICP (RRE: 10.69°, RTE: 40.77 cm) and NDT (RRE: 9.41°, RTE: 38.70 cm). Notably, under severe noise conditions (

σ

= 5 cm), our method maintains superior performance with RRE and RTE values of 3.14° and 12.24 cm, respectively—outperforming even learning-based methods such as RPM-Net (RRE: 6.27°, RTE: 15.61 cm) and IDAM (RRE: 7.23°, RTE: 17.85 cm). The dual-stage architecture, combining coarse feature-based alignment with geometry-aware refinement effectively mitigates noise interference, as evidenced by the minimal error escalation (

Δ

RRE: 1.38°,

Δ

RTE: 4.22 cm), as noise increases from

σ

= 1 cm to 5 cm. In contrast, methods like LSG-CPD and RobustICP exhibit sharper error degradation (e.g., LSG-CPD’s RTE rises from 6.31 cm to 18.46 cm). The results highlight the framework’s resilience to noise, attributed to its multi-scale iterative refinement that prioritizes geometric consistency over raw descriptor matching. This capability is critical for real-world spacecraft operations, where sensor measurements are inherently noisy, and precise pose estimation under uncertainty remains paramount.

3.4.3. Qualitative Results

The quantitative advantages of our method are further corroborated through visual comparisons under varying noise levels. Heavy Sensor Noise: As shown in Figure 4, our method maintains precise alignment even under severe noise (

σ

= 5 cm). The two-stage architecture successfully preserves structural contours through its outlier rejection module, enabling correct identification of solar panel edges obscured by noise. Rapid Spinning Motion: The third row of Figure 4 illustrates trajectory estimations during high-speed rotation. Our method closely follows the ground truth (blue) through effective outlier rejection and model fitting in the coarse stage and multiscale iterative framework in fine registration. Complex Geometric Structures: The last row of Figure 4 demonstrates our superiority in handling spacecraft with intricate components. Our feature-aware matching correctly associates asymmetric protrusions through effective coarse matching to align docking ports with antenna masts successfully. These visual results confirm that our method simultaneously addresses three critical challenges in spaceborne perception: noise resilience through hierarchical filtering, motion robustness via predictive alignment, and structural awareness via discriminative feature matching.

3.5. Ablation Study

To validate our architectural design, we conduct ablation studies comparing three variants: (1) coarse-only using initial alignment without refinement, (2) fine-only without initialization, and (3) full coarse-to-fine approach. Quantitative results are shown in Table 3. In addition, the effectiveness of the coarse stage for addressing the limitation of FPFH and how our method improve the inlier ratio is discussed; the results are presented in Figure 5.

We conducted experiments to evaluate the limitations of the coarse-only method using FPFH features and demonstrate the effectiveness of our proposed efficient outlier rejection strategy. The results, as shown in Figure 5, highlight significant improvements in inlier ratio (IR) after applying our method, particularly in scenarios with repetitive structures (e.g., solar panels) that inherently generate high outlier ratios. Several conclusions can be reached: Outlier Rejection in Repetitive Structures: In the second row, the solar panel region initially exhibited a large number of false matches due to its repetitive geometry. Our outlier rejection method successfully eliminated nearly all outliers, improving the IR by nearly 10× (e.g., from 7.70% to 76.00% in the fourth column). Robustness to Extreme Outliers: The first five rows show initial IRs below 10%, validating our method’s capability to handle scenarios with severe outlier contamination. Post-rejection, the IRs consistently exceeded 50%, with the highest reaching 87% (sixth row, fourth column). Impact of Survival Matches: Columns 2–4 illustrate IRs under varying survival match counts (1000, 500, 100). As the number of survival matches decreases, the IR progressively increases, demonstrating that our method efficiently prioritizes high-quality correspondences. For instance, in the sixth row, reducing survival matches from 1000 to 100 improved the IR from 13.44% to 87.00% (6.47× enhancement). In summary, the substantial IR improvements directly address the computational inefficiency of RANSAC in high-outlier regimes. By reducing outlier proportions, our method minimizes the required RANSAC iterations, thereby enhancing both computational efficiency and registration accuracy. These results underscore the practicality of our approach for real-world applications involving complex geometries and aggressive motion.

Several key conclusions can be drawn from the analysis: Coarse-stage Limitations: The coarse-only variant achieves moderate accuracy (5.32°) but suffers from residual alignment errors, particularly in handling symmetric structures where multiple plausible solutions exist. Fine-stage Sensitivity: The fine-only implementation demonstrates severe performance degradation (8.71°), with higher probability trapped in local minima due to poor initialization. This confirms our hypothesis that traditional optimization-based refinement cannot compensate for inadequate initial guesses in noncooperative scenarios. Synergy Analysis: The full pipeline reduces errors by 64.3% (rotation) and 34.4% (translation) compared to the best individual component, proving the necessity of our hierarchical design. The coarse stage provides initialization within the convergence basin (avg. initial error < 5.32°), enabling the robustness and accuracy in fine stage. This study verifies that neither component alone suffices for space-grade precision, while their integration creates complementary advantages: the learning-based coarse stage ensures global convergence, and the optimization-based fine stage guarantees physical plausibility.

To further validate the necessity and effectiveness of our proposed coarse-to-fine two-stage registration framework, we conduct ablation experiments under challenging scenarios with large inter-frame motions. Specifically, the source and target point clouds are sampled from sequential frames with increasing temporal intervals (10, 20, 30, 40, 50), where larger intervals correspond to rapid relative motions. As shown in Figure 6, directly applying fine registration without a robust initial pose estimation fails to converge to the global optimum, as the algorithm becomes trapped in local minima due to the substantial misalignment. In contrast, our coarse registration stage reliably provides a near-optimal initial transformation for intervals up to 40, achieving successful alignment even under severe pose discrepancies. This coarse alignment effectively narrows the search space for the subsequent fine registration stage, enabling iterative refinement to further minimize the registration error. Notably, when the interval exceeds 50, the motion magnitude surpasses the coarse stage’s capability, highlighting the practical bounds of our method. These experiments confirm that the coarse-to-fine pipeline is critical for handling large motions, balancing robustness and accuracy in spacecraft pose estimation tasks.

3.6. Runtime Performance Analysis

The computational-accuracy trade-off of our point cloud registration pipeline is systematically evaluated through voxel size variations, as summarized in Table 4. The qualitative results are presented in Figure 7. Three distinct operational regimes emerge from this analysis. High-Precision Mode (2.5 cm): Achieves sub-1.5° angular accuracy at the cost of computational intensity (179 ms/frame), suitable for critical docking phases requiring millimeter-level precision. Balanced Mode (5 cm): Maintains 1.9°/8.44 cm accuracy while keeping processing time below 100 ms, striking an optimal balance for most orbital servicing tasks. This configuration satisfies the 10 Hz update rate required by standard spaceborne computers. Real-Time Mode (10 cm): Enables 15FPS performance (67 ms/frame) through aggressive downsampling, demonstrating potential for rapid motion tracking despite increased errors (4.23°/15.78 cm). This analysis confirms our method’s adaptability to mission requirements—from precision-oriented operations to time-critical tracking scenarios.

4. Discussion

The proposed two-stage point cloud registration framework demonstrates strong potential for spacecraft pose estimation under challenging conditions such as rapid inter-frame motion and sensor noise. By combining a feature-based coarse alignment with a geometry-aware fine refinement, the method effectively balances robustness, accuracy, and computational efficiency. A key strength of this approach lies in its ability to circumvent the local minima issue commonly faced by traditional ICP-based optimization methods. The introduction of a length-invariant outlier rejection strategy significantly improves correspondence quality prior to RANSAC, resulting in a more reliable coarse pose initialization. This initialization serves as a strong prior for the subsequent fine registration stage, which leverages multi-scale iterative refinement to ensure precision even in cases with large initial misalignments. Another notable advantage is the framework’s suitability for real-time deployment on resource-constrained spacecraft systems. Unlike deep learning-based methods, which often require extensive training data and substantial inference resources, the proposed method is entirely training-free and exhibits deterministic performance. This makes it a practical choice for onboard processing where computational budgets and data availability are limited. The robustness of the method under sensor noise and motion dynamics has been extensively validated using a synthetically generated benchmark dataset. However, a potential limitation is that the simulation-based dataset, while diverse and realistic, may not capture all physical complexities of actual on-orbit environments, such as unmodeled reflectance properties or unexpected occlusions. Additionally, although the current framework excels in rigid registration scenarios, handling deformable or articulated spacecraft components remains an open challenge. In future work, we plan to extend the method’s applicability to more complex real-world datasets and consider integration with visual-inertial fusion frameworks. Furthermore, incorporating uncertainty modeling or confidence estimation into the registration pipeline may enhance robustness in ambiguous or degenerate cases.

5. Conclusions

In this work, we present a robust two-stage framework for spacecraft pose estimation tailored to address challenges posed by rapid inter-frame motion. By integrating FPFH-based coarse registration with RANSAC and a geometry-aware multi-scale fine registration stage, the proposed method effectively handles large positional displacements while achieving high precision. Experimental validation using simulated ToF sensor data derived from NASA spacecraft models demonstrates the framework’s superiority in accuracy and robustness over existing approaches, particularly under aggressive orbital maneuvers. The architecture’s ability to mitigate error accumulation from fast relative motion and maintain tracking continuity highlights its potential for real-world autonomous spacecraft proximity operations. This work advances the state of the art in pose estimation by balancing computational efficiency with registration fidelity, offering a promising solution for dynamic space missions requiring reliable and real-time navigation capabilities.

Author Contributions

Conceptualization, M.Z.; Methodology, L.X.; Software, M.Z.; Validation, M.Z.; Writing—original draft, M.Z.; Project administration, L.X.; Funding acquisition, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “Pioneer and Leading +X” Science and Technology Plan, Zhejiang Provincial Science and Technology Plan Project (No. 2025C01035).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Srivastava, R.; Sah, R.; Das, K. Model predictive control of floating space robots for close proximity on-orbit operations. In Proceedings of the AIAA SCITECH 2023 Forum, National Harbor, MD, USA, 23–27 January 2023; p. 0157. [Google Scholar]
Jawaid, M.; Elms, E.; Latif, Y.; Chin, T.J. Towards bridging the space domain gap for satellite pose estimation using event sensing. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 11866–11873. [Google Scholar]
Zhang, Y.; Wang, J.; Chen, J.; Shi, D.; Chen, X. A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems. Remote Sens. 2024, 16, 3368. [Google Scholar] [CrossRef]
Hu, L.; Sun, D.; Duan, H.; Shu, A.; Zhou, S.; Pei, H. Non-cooperative spacecraft pose measurement with binocular camera and tof camera collaboration. Appl. Sci. 2023, 13, 1420. [Google Scholar] [CrossRef]
Hu, J.; Li, S.; Xin, M. Real-time pose determination of ultra-close non-cooperative satellite based on time-of-flight camera. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 8239–8254. [Google Scholar] [CrossRef]
Pauly, L.; Rharbaoui, W.; Shneider, C.; Rathinam, A.; Gaudillière, V.; Aouada, D. A survey on deep learning-based monocular spacecraft pose estimation: Current state, limitations and prospects. Acta Astronaut. 2023, 212, 339–360. [Google Scholar] [CrossRef]
Gavilanez, G.; Moncayo, H. Vision-based relative position and attitude determination of non-cooperative spacecraft using a generative model architecture. Acta Astronaut. 2024, 225, 131–140. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Yuan, H.; Chen, H.; Wu, J.; Kang, G. Non-Cooperative Spacecraft Pose Estimation Based on Feature Point Distribution Selection Learning. Aerospace 2024, 11, 526. [Google Scholar] [CrossRef]
Pensado, E.A.; de Santos, L.M.G.; Jorge, H.G.; Sanjurjo-Rivo, M. Deep learning-based target pose estimation using lidar measurements in active debris removal operations. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 5658–5670. [Google Scholar] [CrossRef]
Liu, J.; Sun, W.; Yang, H.; Zeng, Z.; Liu, C.; Zheng, J.; Liu, X.; Rahmani, H.; Sebe, N.; Mian, A. Deep learning-based object pose estimation: A comprehensive survey. arXiv 2024, arXiv:2405.07801. [Google Scholar]
Renaut, L.; Frei, H.; Nuchter, A. CNN-based Pose Estimation of a Non-Cooperative Spacecraft with Symmetries from LiDAR Point Clouds. IEEE Trans. Aerosp. Electron. Syst. 2024, 61, 5002–5016. [Google Scholar] [CrossRef]
Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. In Sensor Fusion IV: Control Paradigms and Data Structures; SPIE: Bellingham, WA, USA, 1992; Volume 1611, pp. 586–606. [Google Scholar]
Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. In Robotics: Science and Systems; University of Washington: Seattle, WA, USA, 2009; Volume 2, p. 435. [Google Scholar]
Zhang, J.; Yao, Y.; Deng, B. Fast and robust iterative closest point. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3450–3466. [Google Scholar] [CrossRef] [PubMed]
Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
Tombari, F.; Salti, S.; Di Stefano, L. Unique signatures of histograms for local surface description. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part III 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 356–369. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3dmatch: Learning local geometric descriptors from RGB-D reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1802–1811. [Google Scholar]
Gojcic, Z.; Zhou, C.; Wegner, J.D.; Wieser, A. The perfect match: 3D point cloud matching with smoothed densities. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5545–5554. [Google Scholar]
Choy, C.; Park, J.; Koltun, V. Fully convolutional geometric features. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8958–8966. [Google Scholar]
Huang, S.; Gojcic, Z.; Usvyatsov, M.; Wieser, A.; Schindler, K. Predator: Registration of 3D point clouds with low overlap. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4267–4276. [Google Scholar]
Qin, Z.; Yu, H.; Wang, C.; Guo, Y.; Peng, Y.; Xu, K. Geometric transformer for fast and robust point cloud registration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11143–11152. [Google Scholar]
Barath, D.; Matas, J. Graph-cut RANSAC. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6733–6741. [Google Scholar]
Quan, S.; Yang, J. Compatibility-guided sampling consensus for 3-D point cloud registration. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7380–7392. [Google Scholar] [CrossRef]
Yang, J.; Huang, Z.; Quan, S.; Qi, Z.; Zhang, Y. SAC-COT: Sample consensus by sampling compatibility triangles in graphs for 3-D point cloud registration. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5700115. [Google Scholar] [CrossRef]
Chen, Z.; Sun, K.; Yang, F.; Tao, W. Sc2-pcr: A second order spatial compatibility for efficient and robust point cloud registration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13221–13231. [Google Scholar]
Yang, J.; Li, H.; Campbell, D.; Jia, Y. Go-ICP: A globally optimal solution to 3D ICP point-set registration. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2241–2254. [Google Scholar] [CrossRef] [PubMed]
Bustos, A.P.; Chin, T.J. Guaranteed outlier removal for point cloud registration with correspondences. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2868–2882. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Shi, J.; Carlone, L. Teaser: Fast and certifiable point cloud registration. IEEE Trans. Robot. 2020, 37, 314–333. [Google Scholar] [CrossRef]
Zhou, Q. Fast global registration. In Computer Vision—ECCV 2016: 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 766–782. [Google Scholar]
Li, P.; Wang, M.; Fu, J.; Zhang, B. Efficient pose and motion estimation of non-cooperative target based on LiDAR. Appl. Opt. 2022, 61, 7820–7829. [Google Scholar] [CrossRef] [PubMed]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 652–660. [Google Scholar]
Bechini, M.; Lavagna, M.; Lunghi, P. Dataset generation and validation for spacecraft pose estimation via monocular images processing. Acta Astronaut. 2023, 204, 358–369. [Google Scholar] [CrossRef]
NASA 3D Resources Website. 2024. Available online: https://nasa3d.arc.nasa.gov/models (accessed on 7 December 2024).
Biber, P.; Straßer, W. The normal distributions transform: A new approach to laser scan matching. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), Las Vegas, NV, USA, 27–31 October 2003; Volume 3, pp. 2743–2748. [Google Scholar]
Liu, W.; Wu, H.; Chirikjian, G.S. LSG-CPD: Coherent Point Drift with Local Surface Geometry for Point Cloud Registration. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15293–15302. [Google Scholar]
Li, J.; Zhang, C.; Xu, Z.; Zhou, H.; Zhang, C. Iterative distance-aware similarity matrix convolution with mutual-supervised point elimination for efficient point cloud registration. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 378–394. [Google Scholar]
Yew, Z.J.; Lee, G.H. Rpm-net: Robust point matching using learned features. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11824–11833. [Google Scholar]

Figure 1. Illustration of continuous ToF imaging of simulated spacecraft sequence.

Figure 2. Pipeline of the proposed method. Letters denote key stages: (A) Preprocessing and feature extraction, (B) Descriptor matching and outlier rejection, (C) RANSAC-based coarse registration, (D) Multi-scale iterative fine registration. Arrows indicate data flow direction between processing stages.

Figure 3. Illustration of the simulated normal maps of four spacecraft models.

Figure 4. Qualitative results with varying noise levels. The source point cloud and target point cloud are marked in yellow and blue, respectively, while transformed source point cloud with estimated pose is in red.

Figure 5. Comparison results on inlier ratio with different sample numbers. The source point cloud and target point cloud are marked in yellow and blue, and the correct matches and false matches are connected with green and red lines, respectively.

Figure 6. Comparison results with varying frame intervals. The source point cloud and target point cloud are marked in yellow and blue, respectively, while transformed source point cloud with estimated pose is in red.

Figure 7. Qualitative results with varying voxel sizes. The source point cloud and target point cloud are marked in yellow and blue, respectively, while transformed source point cloud with estimated pose is in red.

Table 1. Comparison results with relative rotation and translation error metrics. The best results are highlighted in bold, and the second-best are underlined.

Sequences	ICP		NDT		LSG-CPD		RobustICP		IDAM		RPM-Net		Ours
Sequences	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE
Aura	6.76	28.21	8.01	31.28	0.89	8.01	3.37	17.37	1.21	10.29	1.14	11.45	1.07	13.17
Chandra	7.87	26.09	7.17	31.77	3.00	12.78	3.23	7.96	1.96	15.63	1.99	10.26	2.02	4.10
DeepSpace	18.37	97.30	23.60	113.96	2.58	10.01	3.27	28.71	3.45	11.78	3.01	9.23	1.66	9.13
ICESat-2	9.14	42.57	12.91	68.36	1.31	9.06	5.40	15.22	2.05	8.89	1.57	8.87	2.37	12.70
Jason-1	3.27	13.62	3.59	12.24	2.62	8.78	8.99	5.98	2.78	12.45	2.26	7.25	4.08	7.55
Juno	6.30	30.16	2.68	32.86	1.00	14.32	2.55	9.37	2.51	13.69	1.75	10.23	0.52	10.40
Messenger	7.78	27.33	8.16	32.14	2.09	4.53	2.80	16.25	1.92	5.01	1.88	11.47	1.78	5.68
Topex	32.33	26.84	19.95	28.48	2.55	7.54	4.00	8.59	2.66	7.88	3.24	8.01	1.71	4.79
Average	11.48	36.52	10.76	43.87	2.01	9.38	4.20	13.68	2.32	10.70	2.10	9.59	1.90	8.44

Table 2. Comparison results with different levels of noise. The best results are highlighted in bold, and the second-best are underlined.

Sequences	ICP		NDT		LSG-CPD		RobustICP		IDAM		RPM-Net		Ours
Sequences	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE	RRE	RTE
$σ$ = 1 cm	11.89	48.92	10.07	28.51	1.44	6.31	3.04	10.61	2.20	9.58	1.58	9.22	1.76	8.02
$σ$ = 2 cm	11.48	36.52	10.76	43.89	2.01	9.38	4.20	13.68	2.32	10.70	2.10	9.59	1.90	8.44
$σ$ = 3 cm	9.21	39.37	8.55	37.45	2.76	13.02	4.99	15.88	3.14	12.23	3.00	11.05	2.48	9.54
$σ$ = 4 cm	9.86	34.17	8.41	39.12	3.26	13.85	5.83	16.47	5.98	14.69	4.97	12.89	1.85	8.93
$σ$ = 5 cm	11.02	44.86	9.29	44.51	5.07	18.46	7.82	18.60	7.23	17.85	6.27	15.61	3.14	12.24
Average	10.69	40.77	9.41	38.70	2.91	12.20	5.18	15.05	4.17	13.01	3.58	11.67	2.23	9.43

Table 3. Ablation study on registration components. The best results are highlighted in bold.

Configuration	RRE (°)		RTE (cm)
Configuration	Mean	Std	Mean	Std
Coarse-only	5.32	1.15	12.87	2.42
Fine-only	8.71	2.63	14.25	3.38
Full (Ours)	1.90	0.73	8.44	1.02

Table 4. Performance analysis under varying voxel sizes.

	RRE (°)	RTE (cm)	Time (ms)
s = 2.5 cm	1.49	7.22	179
s = 5 cm	1.90	8.44	99
s = 7.5 cm	2.45	12.25	88
s = 10 cm	4.23	15.78	67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, M.; Xu, L. Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach. Remote Sens. 2025, 17, 1944. https://doi.org/10.3390/rs17111944

AMA Style

Zhao M, Xu L. Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach. Remote Sensing. 2025; 17(11):1944. https://doi.org/10.3390/rs17111944

Chicago/Turabian Style

Zhao, Mingyuan, and Long Xu. 2025. "Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach" Remote Sensing 17, no. 11: 1944. https://doi.org/10.3390/rs17111944

APA Style

Zhao, M., & Xu, L. (2025). Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach. Remote Sensing, 17(11), 1944. https://doi.org/10.3390/rs17111944

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Pose Estimation for Noncooperative Spacecraft Under Rapid Inter-Frame Motion: A Two-Stage Point Cloud Registration Approach

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Statement

2.2. Feature-Based Coarse Registration

2.2.1. Point Cloud Preprocessing

2.2.2. Feature Extraction and Matching

2.2.3. RANSAC-Based Pose Estimation

2.3. Geometry-Based Fine Registration

2.4. Point Cloud Dataset Construction

3. Results

3.1. Dataset

3.2. Evaluation Metrics

3.3. Implementation Details

3.4. Comparison to Other Methods

3.4.1. Quantitative Comparison

3.4.2. Robustness to Noise

3.4.3. Qualitative Results

3.5. Ablation Study

3.6. Runtime Performance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI