Next Article in Journal
An Adaptation of Nonlinear Aerodynamic Models for Non-Traditional Control Effectors
Previous Article in Journal
Wide-Range Variable Cycle Engine Control Based on Deep Reinforcement Learning
Previous Article in Special Issue
Robust Onboard Orbit Determination Through Error Kalman Filtering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection

by
Batu Candan
* and
Simone Servadio
Department of Aerospace Engineering, Iowa State University, Ames, IA 50011, USA
*
Author to whom correspondence should be addressed.
Aerospace 2025, 12(5), 425; https://doi.org/10.3390/aerospace12050425
Submission received: 18 March 2025 / Revised: 6 May 2025 / Accepted: 8 May 2025 / Published: 10 May 2025
(This article belongs to the Special Issue New Concepts in Spacecraft Guidance Navigation and Control)

Abstract

:
Accurate and robust relative pose estimation is the first step in ensuring the success of an active debris removal mission. This paper introduces a novel method to detect structural markers on the European Space Agency’s Environmental Satellite (ENVISAT) for safe de-orbiting using image processing and Convolutional Neural Networks (CNNs). Advanced image preprocessing techniques, including noise addition and blurring, are employed to improve marker detection accuracy and robustness from a chaser spacecraft. Additionally, we address the challenges posed by eclipse periods, during which the satellite’s corners are not visible, preventing measurement updates in the Unscented Kalman Filter (UKF). To maintain estimation quality in these periods of data loss, we propose a covariance-inflating approach in which the process noise covariance matrix is adjusted, reflecting the increased uncertainty in state predictions during the eclipse. This adaptation ensures more accurate state estimation and system stability in the absence of measurements. The initial results show promising potential for autonomous removal of space debris, supporting proactive strategies for space sustainability. The effectiveness of our approach suggests that our estimation method, combined with robust noise adaptation, could significantly enhance the safety and efficiency of debris removal operations by implementing more resilient and autonomous systems in actual space missions.

1. Introduction

The escalating problem of space debris necessitates innovative solutions for identification and removal with a high level of autonomy [1], particularly for large defunct satellites such as ESA’s ENVISAT, an inactive Earth observation satellite. In the past ten years, deep learning (DL) has profoundly influenced the development of computer vision algorithms, enhancing their performance and robustness in various applications such as image classification, segmentation, and object tracking. This momentum has carried into spacecraft pose estimation, where DL-based methods have begun to surpass traditional feature-engineering techniques reported in the literature [2,3], corner and marker detection algorithms such as FAST, Shi-Tomasi, and Hough transform methods [4,5,6,7]. CNNs have shown promising results in spacecraft pose estimation tasks, particularly for non-cooperative targets. Previous studies have utilized a variety of techniques, including perspective-n-point (PnP) algorithms [8], blob detection methods [9], and region-based methods [10,11], to estimate spacecraft poses from monocular images. These methods, while effective under controlled conditions, often struggle with challenges such as poor lighting, sensor noise, and complex geometries typical of space-based imagery [12]. Generally, the scarcity of extensive synthetic space image datasets, crucial for comprehensive CNN training, necessitates the use of networks pre-trained on terrestrial images. To adapt these for space applications, transfer learning is used, focusing on training only selected layers of the CNN [13]. Detection of key points such as corners involves predicting the 2D projections of specific 3D key points from the spacecraft’s imaged segments using a deep learning model. These key points are usually determined by the CAD model of the spacecraft. In the absence of a CAD model, methods like multi-view triangulation or structure from motion are employed to create a 3D wire frame model that includes these key points [14,15]. The key point location regression technique involves direct estimation of key point positions [16,17]. The segmentation-driven method utilizes a network with dual functions, segmentation and regression, to deduce key point locations. The image is sectioned into a grid, isolating the spacecraft within certain grid cells. The positions of the key points are then computed as offsets within these identified cells, which improves the prediction accuracy. Variations of this model have been optimized for space deployment, reducing the parameter count without compromising the accuracy of prediction [18,19]. The heat map prediction method represents the likelihood of key point locations from which the highest probability points are extracted as the actual locations. Moreover, there is a bounding box approach using the enclosed bounding boxes over the key points that are predicted along with confidence scores rather than using the key point locations or heat maps [20,21]. Recent work [22] has highlighted the limitations of feature-based approaches and demonstrated how CNNs can overcome some of these challenges, improving robustness against sensor noise and environmental variations. In this study, we expand on these efforts by leveraging CNNs to detect structural markers on the ENVISAT satellite, which is crucial for autonomous space debris removal and de-orbiting operations.
On the other hand, the estimation algorithms are an integral part of determining the spacecraft pose, and they involve deducing the value of a desired quantity from indirect, imprecise, and noisy observations. When this quantity is the current state of a dynamic system, the process is termed as the best estimate obtained from the measurements. Moreover, estimation of the relative position and the target attitude are crucial for safe proximity operations. This necessitates complex on-board computations at a frequency ensuring accuracy requirements are met. However, the limited computational power of current processors constrains the estimation processes. Therefore, developing efficient algorithms that minimize computational demands while maintaining necessary performance and reaching the best estimate are crucial for the sake of these missions. This estimate is produced by an optimal estimator, a computational algorithm, effectively utilizing available data, system knowledge, and disturbance information. For linear and Gaussian scenarios, the posterior distribution remains Gaussian, and the Kalman Filter (KF) is used to compute its mean and covariance matrix. However, practical problems are often nonlinear, leading to non-Gaussian probability density functions (PDFs). Various techniques address non-linear estimation problems in space applications. One straightforward method is to linearize the dynamics and measurement equations around the current estimate, as done in the Extended Kalman Filter (EKF) [23], which applies KF mechanics to a linearized system. Higher-order Taylor series approximations can extend the EKF’s first-order approximation. The Gaussian Second-Order Filter (GSOF), for example, truncates the Taylor series at the second order to better handle system nonlinearity [24]. This requires knowledge of the estimation error’s central moments up to the fourth order to calculate the Kalman gain. For instance, while the EKF uses first-order truncation requiring covariance matrices, the GSOF requires third and fourth central moments of the state distribution. The GSOF approximates the prior PDF as Gaussian at each iteration and performs a linear update based on a second-order approximation of the posterior estimation error. Other linear filters use different approximations, such as Gaussian quadrature, spherical cubature, ensemble points, central differences, and finite differences [25]. Alternatively, the Unscented Kalman Filter (UKF) employs the unscented transformation to better manage nonlinearity in dynamics and measurements, typically achieving higher accuracy and robustness than the EKF. The UKF uses this transformation for a more precise approximation of the predicted mean and covariance matrix, remaining a linear estimator where the estimate is a linear function of the current measurement [26]. The UKF offers a solution by using the unscented transformation, which avoids linearization by propagating carefully selected sample points through the nonlinear system, providing superior performance in such situations [27,28]. Recently, even the use of artificial intelligence methods for non-linear estimation has also been reported in the literature [29,30].
In this paper, we address a novel approach using image processing from a chaser spacecraft to detect structural markers on the ENVISAT satellite, as well as an estimation framework utilizing UKF to facilitate its safe de-orbiting. Utilizing high-resolution imagery, this study employs our own CNN, TinyCornerNET, designed for precise marker detection, essential for the subsequent removal process. The methodology incorporates image preprocessing, including noise addition, to enhance feature detection accuracy under varying space conditions. Each image taken by the camera is processed via the CNN, which provides corner measurements to the filter, and the filter predicts the required relative pose states. This is challenging due to the tumbling motion of ENVISAT. Until this point, there is no merged solution for this problem; therefore, we propose a full pipeline for the relative pose navigation system, where the relative position and velocity states are estimated for both translational and angular components. To enhance the robustness and to propose a realistic scenario, this work introduces a tuning process for noise covariance matrix ( Q ) adjustment during eclipse periods, when the satellite’s corner markers become occluded and measurements are unavailable, which was not studied in recent works [31]. This approach ensures that the state estimation remains robust by accounting for increased uncertainty during these periods of no visibility, thus improving the accuracy of both translational and rotational state predictions. By adjusting Q during eclipse periods, the UKF can effectively mitigate the risk of estimation divergence, maintaining system stability and ensuring accurate tracking even in the absence of real-time measurements. Preliminary results demonstrate the system’s efficacy in identifying corner points on the satellite and its ability to maintain translational and rotational estimates within appropriate levels, promising a significant leap forward in automated space debris removal technologies. This work builds on recent advances in space debris monitoring and removal strategies, echoing the urgent call for action highlighted in studies such as [32,33,34], which emphasize the growing threat of space debris and the need for effective removal mechanisms [35]. Our findings indicate a scalable solution for debris management, aligned with the proactive strategies recommended by the Inter-Agency Space Debris Coordination Committee (IADC) [36].

2. Methodology

In this section, the methodology underlying the proposed approach is detailed. It covers the dynamics governing both the chaser and the target spacecraft, the simulation setup for the ENVISAT satellite, and the process of preparing the data for image-based measurements and estimation.

2.1. Dynamics

In the following analysis, several assumptions are made to represent the dynamics modeling the chaser and target spacecraft. Firstly, it is presumed that the inertia properties of both the chaser and the target are perfectly known beforehand. This assumption simplifies the dynamic modeling by removing uncertainties related to mass distribution and moments of inertia. Secondly, the absolute motion of the chaser is considered deterministic. Lastly, neither flexible dynamics nor external disturbances are taken into account. This implies that the translational and rotational dynamics are decoupled. These assumptions streamline the analysis by focusing on the primary dynamics without uncertain factors. However, they are validated by the knowledge of the scenario, as ENVISAT’s models are known and available to the public. Therefore, the motion of the chaser and the target can be represented as follows.

2.1.1. Absolute Chaser Motion

The chaser motion is described by the following equations, where μ is the gravitational parameter of the Earth, r ¯ is the position of the chaser center of mass with respect to the Earth, and θ is the true anomaly in the orbit of the chaser.
r ¯ ¨ = r ¯ θ ˙ 2 μ r ¯ 2
θ ¨ = 2 r ¯ ˙ θ ˙ r ¯

2.1.2. Relative Translational Dynamics

The target, ENVISAT, has its relative translational dynamic equations developed with respect to the local-vertical–local-horizontal (LVLH) frame of the chaser. The target relative position, denoted as r r , and relative velocity, v r , are defined in the chaser LVLH frame, expressed as follows:
r r = x i ^ + y j ^ + z k ^
v r = x ˙ i ^ + y ˙ j ^ + z ˙ k ^
In this context, x, y, and z are the components of the position vector r r within the chaser LVLH frame, and i ^ , j ^ , and k ^ represent the respective unit vectors of the reference frame. Thus, the equations of relative motion of the target can be written in the following way:
x ¨ = 2 θ ˙ y ˙ + θ ¨ y + θ ˙ 2 x μ ( r ¨ + x ) [ ( r ¯ + x ) 2 + y 2 + z 2 ] 3 / 2 + μ r ¯ 2
y ¨ = 2 θ ˙ x ˙ θ ¨ x + θ ˙ 2 y μ y [ ( r ¯ + x ) 2 + y 2 + z 2 ] 3 / 2
z ¨ = μ z [ ( r ¯ + x ) 2 + y 2 + z 2 ] 3 / 2
Figure 1 gives a visual representation of the translational dynamics [37].

2.1.3. Rotational Dynamics

The relative orientation of the body-fixed frame on the target with respect to the body-fixed frame of the chaser can be represented with a rotational matrix Γ . Thus, the relative angular velocity ω r in the target body-fixed reference frame is dependent on the angular velocity of the chaser ω c and the angular velocity of the target ω t , both represented in their own body-fixed reference frames.
ω r = ω t Γ ω c
ω ˙ r = ω ˙ t Γ ω ˙ c + ω r Γ ω c
The relative attitude of the target is established by the parametrization of the rotation matrix Γ . The Modified Rodriguez Parameters (MRPs) are utilized as the minimal set of three parameters that allow it to overcome singularities and to describe rotation. The quaternions representing the orientation of ENVISAT can be expressed as q , which consists of four components: a scalar one q 4 , and q 1 , q 2 , and q 3 . The three-axis angular velocity of ENVISAT is represented by ω , with components ω x , ω y , and ω z .
q = q ¯ q 4
q ¯ = q 1 q 2 q 3 = n ^ sin ϕ 2
q 4 = cos ϕ 2
p = q ¯ ( 1 + q 4 ) = n ^ tan ϕ 4
where p is the three-dimensional MRP vector. The kinematic equation of motion is able to be derived using the target’s relative angular velocity. Therefore, the time evolution of the MRP is described as follows.
p ˙ = 1 4 ( 1 p T p ) I 3 + 2 p p T + 2 [ p ] ω r
Note that I 3 is a 3 × 3 identity matrix, and [ p ] identifies the 3 × 3 cross product matrix, given as follows:
[ p ] = 0 p 3 p 2 p 2 0 p 1 p 2 p 1 0
The rotation matrix Γ that connects the chaser body-fixed frame and the target body-fixed frame can thus be derived as follows:
ϵ 1 = 4 1 p T p ( 1 + p T p ) 2
ϵ 2 = 8 1 ( 1 + p T p ) 2
Γ ( p ) = I 3 ϵ 1 [ p ] + ϵ 2 [ p ] 2
The torque-free Euler equations describe the absolute rotational dynamics of the chaser. The relative attitude dynamics are obtained by substituting the kinematics relationship in the Euler absolute equations of the target spacecraft.
J t ω ˙ r + ω r J t ω r = M a p p M g M c i
Note that J t is the matrix of inertia of the target, M a p p is the apparent torques, M g is the gyroscopic torques, and M c i is the chaser-inertial torques, defined as follows:
M a p p = J t ω r Γ ω c
M g = Γ ω c J t Γ ω c + ω r J t ω c + Γ ω c J t ω r
M c i = J t Γ ω ˙ c

2.2. ENVISAT Satellite Simulation

Detailed models of the satellite’s orientation and movement patterns, replicating the complex dynamics encountered in orbit, are simulated in this section. By leveraging computational tools, we were able to create accurate ground truth data for both the pose and marker locations on the satellite by propagating the equations of motion of the chaser and target satellites using the Runge–Kutta–Fehlberg method. This simulation process was critical for establishing a reliable dataset that mirrors the real-world conditions the satellite would experience in space. Figure 2 visualizes the ENVISAT satellite and its geometrical dimensions used within the simulation environment (commas are decimal separators). The European Space Agency online sources provide details about ENVISAT’s mass, the location of its center of mass, and its moments of inertia, as well as its geometric center and volume [38]. Figure 3 shows the series of simulated images for the ENVISAT satellite as follows. Note that the solar panel is not included in the simulation, but the main rectangular body is modeled for assessing the accuracy of the corner detection algorithm and verification. The state vector that has 12 components (the relative position between target and chaser centers of mass, relative velocity of the center of mass, MRPs for attitude, and angular velocities) and the true measurement of all the corner locations (which would be the ground truth for the corner detection algorithm) are stored after each simulation.

2.3. Data Preparation

Upon completion of the simulation, the generated images, which encapsulate the precise pose and marker information of the satellite, were transferred to a Python environment for further processing. The transition from MATLAB R2024b to Python 3.13.3 was essential for integrating the simulation data with the image processing and machine learning pipeline that follows. In Python, one of the initial steps involved converting the RGB images obtained from the simulation into gray-scale. This conversion is pivotal for the subsequent corner detection process, as working with gray-scale images reduces computational complexity and focuses the analysis on the structural information relevant to marker detection. By eliminating the color data, we can enhance the efficiency and accuracy of the CNN in identifying the critical corner locations on the satellite. The streamlined dataset, now optimized for neural network processing, sets the stage for the effective application of machine learning techniques in detecting the markers essential for the satellite’s pose estimation and debris removal strategy.

3. TinyCornerNET: A Lightweight CNN for Spaceborne Marker Detection

CNNs are a type of deep learning model designed to process data with a grid-like topology, such as images. CNNs use convolutional layers to automatically and adaptively learn spatial hierarchies of features from input data. Key components include convolutional layers, pooling layers, and fully connected layers. The architecture of CNNs has evolved from simpler models like AlexNet [39] to more complex ones like High-Resolution Network (HR.Net) [40]. These advancements have significantly enhanced their ability to perform tasks such as image and video recognition, object detection, and more. The advantages of using CNNs over traditional neural networks in computer vision include the reduction of the number of trainable parameters, improving generalization, reducing overfitting, and ease of implementation for large-scale networks with CNNs compared to other neural network types.

3.1. The CNN Architecture Basics

The CNN architecture consists of multiple layers, each serving a distinct function. These functions include a convolutional layer, pooling layer, activation function, fully connected layer, and loss functions. Over the past decade, numerous CNN architectures have been introduced. These architectures have significantly enhanced performance across various applications through structural changes, regularization, and parameter optimization. Major improvements have stemmed from reorganizing processing units and developing new blocks, particularly by increasing network depth. This section examines key CNN architectures from AlexNet in 2012 to the High-Resolution model, analyzing features like input size, depth, and robustness to guide researchers in selecting the appropriate architecture for their tasks [41]. Figure 4 visually demonstrates the CNN architecture as following [42].
The input layer is where the raw data (e.g., images) are fed into the network. The convolutional layer applies convolutional filters to the input data to extract features such as edges, textures, and patterns. Each filter creates a feature map by sliding over the input data and performing element-wise multiplications and summations. The rectified linear unit (ReLU) activation block function introduces non-linearity into the network. It replaces all negative pixel values in the feature maps with zero, allowing the network to learn more complex patterns. The pooling layer reduces the spatial dimensions of the feature maps (width and height) while retaining the most important information. Common types of pooling include max pooling, which selects the maximum value from a set of neighboring pixels, and average pooling, which computes the average value. The flattening block converts the 2D feature maps into a 1D vector, which can be fed into the fully connected layers. This step is necessary for transitioning from the convolutional layers to the fully connected layers. A fully connected layer connects every neuron in the previous layer to every neuron in the next layer. It combines the features extracted by the convolutional layers to make final predictions. This layer often includes several fully connected layers for deeper networks. Finally, the output layer produces the final predictions of the network. In classification tasks, this layer typically uses a softmax activation function to output a probability distribution over the possible classes [43].

3.2. Network Architecture

In this work, the chosen neural network model is our lightweight CNN architecture design, named TinyCornerNET, developed specifically for structural marker detection on the ENVISAT satellite. Unlike more complex networks such as L-CNN [44] or stacked hourglass models, TinyCornerNET is optimized for efficiency and simplicity, making it highly suitable for real-time applications in spaceborne systems with limited computational resources. TinyCornerNET consists of a series of convolutional layers with ReLU activations and a final sigmoid-activated convolutional output layer to predict a likelihood map of marker positions. The detailed architecture is given in Figure 5, and the parameters are summarized in Table 1. Note that it is trained with 100 random simulated images.

3.3. Marker Detection

TinyCornerNET outputs a heatmap representing the probability of marker locations in the image. A dynamic threshold based on the maximum heatmap intensity is applied to identify candidate corners. Non-Maximum Suppression (NMS) is then used to eliminate redundant detections, retaining only the most probable marker positions. The performance of TinyCornerNET marker detection is compared against traditional corner detection methods such as Harris [45], Shi-Tomasi [5], and FAST [4]. As shown in Figure 6, TinyCornerNET demonstrates superior reliability, detecting a higher number of markers across a variety of conditions, including noisy and blurred imagery. The performance results reported in the histogram are based on a complete simulation composed of 6035 simulated images, with the addition of mild black-pepper noise and affine translation-rotation that mimics the behavior of camera movements on a dynamic platform, like shaking, vibrating, or rotating, which are counted as external disturbances.
As evident from the results, TinyCornerNET consistently detects a higher number of markers compared to traditional algorithms, particularly under degraded imaging conditions. The mean number of detected markers for TinyCornerNET was 6.37, clearly outperforming L-CNN (5.36), Harris (3.47), Shi-Tomasi (5.19), and FAST (0.61) detectors. Note that our CNN detects a minimum of five and a maximum of seven corners for each image; there are no fewer corners detected by TinyCornerNET (blue color) on the histogram. This superior performance can be attributed to TinyCornerNET’s ability to learn spatial features directly from training data rather than relying on handcrafted feature extraction methods, which are often sensitive to noise and illumination changes. As a result, TinyCornerNET demonstrates greater robustness and adaptability, making it an ideal solution for marker detection tasks in dynamic and challenging spaceborne imaging environments. It is a fast, lightweight, and highly reliable marker detection method that greatly improves downstream processing pipelines such as 3D reconstruction and autonomous navigation in space missions. Its computational time is comparable to the Harris corner detector, taking approximately 250 s to process 6035 images.

4. Filtering

The selected algorithm for this study is the UKF, used to estimate the relative pose of ENVISAT with respect to the chaser. The UKF is an estimation algorithm designed to handle nonlinear systems. Unlike the EKF, which relies on linearizing the system and measurement equations using first-order Taylor expansions, the UKF employs the unscented transformation to address nonlinearity directly. However, it is important to underline that it still employs a linear measurement update [46], which better fits the limited computational power of onboard computers. This method involves propagating a set of carefully chosen sample points through the nonlinear system to capture the posterior mean and covariance accurately. The UKF’s effectiveness is demonstrated through its application in various scenarios, such as the European Space Agency’s e.deorbit mission, which targets the ENVISAT satellite [47]. The performance of the UKF, evaluated through numerous numerical simulations, highlights its advantages in terms of accuracy, robustness, and computational efficiency, making it a preferred choice for handling the nonlinear dynamics of space missions [48].

4.1. The Filtering Algorithm

The weights for the sigma points in the UKF are determined using the parameters α , β , and κ . These parameters help with controlling the spread and scaling of the sigma points around the mean. The parameter α determines the spread of the sigma points, κ is a secondary scaling parameter, and β is used to incorporate prior knowledge of the distribution. The scaling parameter λ is computed as:
λ = α 2 ( L + κ ) L
where L is the dimensionality of the state vector, 12 in our case. The weights for the mean and covariance of the sigma points are given as:
W m ( 0 ) = λ L + λ
W c ( 0 ) = λ L + λ + ( 1 α 2 + β )
W m ( i ) = W c ( i ) = 1 2 ( L + λ ) , i = 1 , , 2 L
where W m ( i ) are the weights for the mean and W c ( i ) are the weights for the covariance. The UKF algorithm can be divided into the prediction step and measurement update.

4.1.1. Prediction Step

The PDF is represented via a set of points:
χ k ( i ) = x ^ k 1 for i = 0 x ^ k 1 + ( ( L + λ ) P k 1 ) i for i = 1 , , L x ^ k 1 ( ( L + λ ) P k 1 ) i L for i = L + 1 , , 2 L
where χ k ( i ) are the sigma points, x ^ k 1 is the previous state estimate, and P k 1 is the previous state covariance. The points are propagated to the current time step:
χ k | k 1 ( i ) = f ( χ k 1 ( i ) , u k 1 )
where f is the state transition function and u k 1 are the control inputs. The predicted mean is calculated as follows:
x ^ k | k 1 = i = 0 2 L W m ( i ) χ k | k 1 ( i )
and the state predicted covariance is as follows:
P k | k 1 = i = 0 2 L W c ( i ) χ k | k 1 ( i ) x ^ k | k 1 χ k | k 1 ( i ) x ^ k | k 1 T + Q
where Q is process noise covariance for additive disturbances.

4.1.2. Update Step

The update step of the filters starts with the evaluation of the sigma points in the measurement domain:
γ k ( i ) = h ( χ k | k 1 ( i ) )
where h is the measurement function and γ k ( i ) is the sigma points that are transformed through the measurement function. The predicted measurement mean z ^ k is computed as follows:
z ^ k = i = 0 2 L W m ( i ) γ k ( i )
The predicted measurement covariance S k is calculated as:
S k = i = 0 2 L W c ( i ) γ k ( i ) z ^ k γ k ( i ) z ^ k T + R k
where R k is the measurement noise covariance. R k is tuned to account for varying levels of camera noise and blur in the imagery used for detecting corners on the satellite. R k is constructed as a 21 × 21 matrix to account for the noise in the position measurements of seven structural markers on the satellite. Each marker provides measurements along three axes (x, y, and z), which results in three rows and columns per marker in the matrix. Therefore, for all seven markers, R k has a total of 21 rows and columns. Kalman gain K k is calculated as follows.
T k = i = 0 2 L W c ( i ) χ k | k 1 ( i ) x ^ k | k 1 γ k ( i ) z ^ k T
K k = T k S k 1
The state and its covariance can be updated with the following:
x ^ k = x ^ k | k 1 + K k ( z k z ^ k )
P k = P k | k 1 K k S k K k T
Overall, the UKF stands out as a powerful tool in the realm of nonlinear estimation, offering significant improvements over traditional methods by leveraging its nonlinear transformation capabilities.

5. The Measurement Model

The observations provided to the filter use a measurement model that combines the translational and rotational states. However, the propagation of dynamics can still be separated into translational and rotational components, resulting in faster and more efficient estimation of relative translational states and relative velocities and relative rotational states and angular velocities. This approach allows the state vector, although 12 components long, to be divided into two separate parts of 6 components each, propagating the translational and rotational models in parallel. The filter uses a 4th-order Runge–Kutta integrator for propagation in order to assess the onboard application. Before beginning the estimation, the filter uses information from the previous step to calculate the marker visibility. Depending on the simulation requirements, the filter can operate with the entire set of markers or limit measurements to three markers, with the selection process explained later. Additionally, measurement failures can be included in the simulation to show the robustness of the proposed technique. Finally, the estimated relative states are compared with the true states, propagated with high accuracy, to evaluate the performance of the filter.

5.1. Marker Creation

Filters require accurate measurements to effectively correct the predicted values and accurately determine the target’s attitude. Most filters for space applications depend on camera image processing. In practical scenarios, the image processing software is configured to identify target points in each captured image, known as markers. The software processes the camera image, identifies the marker positions, and then transmits this information to the filter. Selecting markers is a complex task influenced by the target’s shape, volume, and color, as the image processing must be rapid. Commonly, target corners are selected as markers, utilizing reliable corner detection algorithms like the Harris–Stephens [45] and Förstner algorithms. Effective interaction between the filter and the image processing software is crucial. After an initial period during which the first measurements are taken and the position error rapidly decreases, the communication should be optimized to expedite marker estimation. Once the filter completes its iterative cycle, it can inform the camera where to search for markers in the subsequent image. This enables the camera software to analyze a smaller image region, reducing the need to process all pixels and focusing only on those near the predicted marker positions. In the context of the ENVISAT relative pose estimation problem, it was decided to use the corners of the main body as markers and track their positions over time. Consequently, the marker positions can offer valuable information since the position of each marker is well known with respect to the center of mass. By tracking the trajectory of these markers, the filter can reconstruct the state of the spacecraft and calculate its relative position and velocity. The main body of ENVISAT, excluding the solar panel, can be modeled as a simple parallelepiped with eight corners. These corners are selected as the filter markers, and their positions relative to the center of mass are known. Each marker is identified by a letter, resulting in markers labeled A, B, C, D, E, F, G, and H, as reported in the previous studies [37].

5.2. Camera Transformation and 3D Reconstruction from 2D Pixel Data

To accurately estimate the 3D positions of the markers on the satellite, it is necessary to transform the 2D pixel coordinates detected by the CNN module into 3D space. This transformation is achieved using both the intrinsic and extrinsic camera parameters. Figure 7 visually demonstrates how the camera transformation works from world coordinates to pixel coordinates. Note that the selected camera model is a pinhole camera model [49].
In the figure, x w , y w , and z w are the world coordinates of the satellite corners; x c , y c , and z c are the coordinates of the satellite’s corners on the camera frame; x i and y i are the coordinates of the satellite’s corners on the image frame; and u and v are the pixel locations to be detected by the CNN.

5.2.1. Intrinsic Camera Parameters

The intrinsic parameters describe the internal characteristics of the camera, including its focal length, principal point, and pixel scaling. These parameters are represented by the camera intrinsic matrix C i n t , which is a 3 × 3 matrix that maps the 3D coordinates of a point in the camera’s coordinate system to 2D image plane coordinates. The intrinsic matrix is defined as:
C i n t = f x 0 c x 0 f y c y 0 0 1
where f x and f y are the focal lengths in the x and y directions, respectively. Parameters c x and c y represent half the width and height of the image plane, respectively. In the pinhole camera model, they are used to ensure that the u and v coordinates are always positive. This transforms the image coordinate system to have a domain of [ 0 , 2 c x ] and [ 0 , 2 c y ] for u and v, respectively, rather than [ c x , c x ] and [ c y , c y ] . This matrix is applied to the 3D points in camera space to project them onto the 2D image plane, producing pixel coordinates. The CNN module detects these pixel locations for the satellite’s corners.

5.2.2. Extrinsic Camera Parameters

The extrinsic parameters define the position and orientation of the camera in the world coordinate system. These parameters are encapsulated in the extrinsic transformation matrix [ Γ | t ] , where Γ is a 3 × 3 rotation matrix (from the MRPs) and t is a 3 × 1 translation vector. The extrinsic matrix transforms the 3D world coordinates of the satellite into the camera’s coordinate system:
C e x t = Γ t 0 1
Therefore, the transformation from world coordinates X world to camera coordinates X camera is given by:
X camera = Γ X world + t

5.2.3. Conversion from 2D Pixel Coordinates to 3D World Coordinates

To reconstruct the 3D positions of the satellite’s markers from 2D image coordinates, a standard camera transformation process is employed. Starting from the 2D pixel coordinates ( u , v ) detected by the CNN module, the following transformation is applied. Using the intrinsic camera matrix C i n t , the 2D pixel coordinates ( u , v ) are mapped to camera coordinates X camera as follows:
X camera = C i n t 1 u v 1 · Z camera
Here, Z camera is the depth of the point, and C i n t 1 is the inverse of the intrinsic matrix. This step translates the 2D pixel coordinates into the camera’s 3D space by introducing the depth information, obtained as a part of the state estimate relative position.

5.2.4. Transformation to World Coordinates

Once the 3D coordinates in the camera space are computed, the inverse of the extrinsic matrix [ Γ | t ] is used to convert these into world coordinates:
X world = Γ 1 ( X camera t )
By applying these transformations, the detected 2D pixel coordinates from the CNN module are converted into 3D world coordinates, enabling accurate state estimation of the satellite’s relative position and orientation by providing measurements. Figure 8 visually summarizes the transformations between coordinate frames.

5.3. Measurement Equations

The state vector has 12 components that can be divided into 4 parts. Each part, composed of three elements, describes one aspect of the pose of the target satellite, ENVISAT.
x = ( x , y , z , x ˙ , y ˙ , z ˙ , p 1 , p 2 , p 3 , w x , w y , w z )
Note that the full simulation works with 18 states for the propagation of absolute states as well, i.e., the chaser’s orbit in terms of position and velocity with respect to the ECI frame. The four key components to be tracked are the relative position between the centers of mass of the target and chaser, the relative velocity of the center of mass, the Modified Rodrigues Parameters (MRPs), and the angular velocities. Therefore, it is necessary to evaluate the marker positions based on the predicted state to compare the predicted measurements with the actual measurements obtained from the camera system during the update step of the filter. Let u represent the position vector of the chaser’s center of mass relative to ENVISAT’s center of mass. The measurements for each marker are calculated individually as follows: the marker position vector v i is initially expressed in the target’s reference frame, known from the available CAD models of ENVISAT. This vector is then transformed into the chaser’s reference frame by multiplying it with the rotation matrix Γ . Finally, the position measurement of each marker relative to the chaser is determined through a simple vector difference:
z i = Γ T v i u
Here, z i represents the position measurement of a marker relative to the chaser’s center of mass, and the rotation matrix Γ is derived from the MRP, meaning that the measurement models couples translation and rotational states. Figure 9 demonstrates how the vectors are arranged in the measurement equations [37].

5.4. Measurement Noise Covariance Calculation

A statistical estimation procedure was developed based on repeated TinyCornerNET detections over a single image to obtain a physically accurate measurement noise covariance matrix R that reflects real-world sensor uncertainty. We empirically derive R from repeated runs of TinyCornerNET under simulated variability, resulting in data-driven characterization of the uncertainty of the 3D measurement rather than giving it arbitrary or constant values. The procedure starts with a chosen image projected from the satellite simulation along with the corresponding ground truth corner locations and depths. TinyCornerNET utilizes randomized sensor perturbation in multiple iterations to detect 2D corner positions. These perturbations, which mimic real-world phenomena like camera shake, sensor noise, and image deterioration, include affine jitter, random noise injection, and controlled blur. The positions of each set of identified 2D pixels are transformed into 3D coordinates. For each marker i { A , B , , H } , this process yields a distribution of 3D measurements { z i ( k ) } k = 1 N , where N is the number of Monte Carlo runs. The sample covariance of these measurements is then computed as follows:
R i = 1 N 1 k = 1 N ( z i ( k ) z ¯ i ) ( z i ( k ) z ¯ i ) T
where z ¯ i is the mean of the 3D positions of marker i. To capture a global characterization of the sensor noise model, all residuals across all markers are stacked, and a global covariance matrix R is estimated:
R = cov i z i ( k ) z ¯ i k = 1 N
The resulting covariance matrix is integrated into the UKF initialization, ensuring the filter’s confidence in measurements. Figure 10 visualizes the CNN-detected marker spread around the ground truth for a sample image, where ellipses represent 3 σ contours of the 2D covariance. Based on the TinyCornerNET detection statistics, the measurement noise covariance matrix for a single corner in 3D space was estimated through repeated randomized simulations. The resulting covariance captures the expected uncertainties, incorporating the effects of sensor noise, blur, and affine perturbations. The final estimated R matrix for a single corner is given as follows:
R S i n g l e C o r n e r = 4 × 10 2 0 2 × 10 2 0 1 × 10 2 0 2 × 10 2 0 4 × 10 2
As shown, the off-diagonal elements reveal a correlation between the x and z components, while the y-axis (depth) remains uncorrelated. This structure reflects the physical imaging characteristics where depth uncertainty arises independently from lateral uncertainties. Since each satellite corner marker is assumed to be independent, the full measurement noise matrix R for all markers is constructed by stacking the individual R Single Corner blocks along the diagonal, forming a 21 × 21 covariance matrix for the measurement vector when all visible markers are detected.

5.5. Markers Visibility

The presented measurement model relies on the positions of the eight corners of ENVISAT’s main body. However, the camera cannot capture all marker positions in a single frame because some markers are obscured by the ENVISAT’s structure itself. As a result, the filter cannot use the entire set of markers simultaneously; it has to adjust its measurements based on the visible markers. Consequently, the size of the measurement vector varies depending on the number of visible markers. Each marker provides three position measurements as an observation, so the measurement vector z ^ will have 3 · i components, where i = 0 , , 7 . The visible markers are located on the visible faces of the satellite, which are easier for the CNN to detect. Thus, the detection process focuses on identifying these visible faces for marker extraction. The requirement for face visibility is determined by the following equation:
u · Γ n ^ i < 0 for i = ψ 1 , , ψ 8
where ψ i defines the i t h face of the main body. If the scalar product between the relative position vector of the chaser and the target u and the unit vector perpendicular to the face n ^ i is negative, it indicates that the face is oriented towards the camera, making the markers on that face visible. The filter uses the state information at the beginning of each observation to predict which markers will be visible in the next step, preparing to receive the correct number of measurements from the camera. The filter hardcodes the mapping between ENVISAT’s faces and the relative markers according to the vectors n ^ i , thereby predicting marker visibility. The filter handles marker visibility in a binary manner, assigning a value of one if a marker is visible and zero if it is not, so that it can adapt the measurement dimensionality.

5.6. Marker Association

Once the CNN detects the potential marker positions, these measurements are compared to the predicted locations obtained busing Equation (44) for each marker through the filter’s estimated states. The association process is refined by minimizing the total distance between the predicted and measured positions. The total minimum distance is calculated using the Euclidean distance to ensure that the predicted marker locations are as close as possible to the measured ones. Mathematically, this operation can be shown by constructing a cost function, J, that minimizes the squared distance between the predicted marker locations, ø j , and the measured marker locations, Ξ i . The cost function J selects the pairings ( i , j ) that minimize the total squared distance, effectively associating each predicted marker to the closest measured marker. This can be formulated as follows:
J = arg min i , j i , j ( Ξ i ø j ) 2
Figure 11 visually demonstrates the marker association process that uses the previous equation to match the specific markers with the corners.
Dashed yellow lines represent the distance to the second-closest corner, while the orange solid line shows the shortest path to a specific corner. The marker association optimization ensures that the markers identified by the CNN are associated with the correct position vector v i .

6. Numerical Results

Before discussing the results of this work, it would be appropriate to give an end-to-end pipeline for summarizing all components until this point. Therefore, Figure 12 provides a pipeline diagram of the method discussed in this work. Moreover, as shown in Figure 9, ENVISAT’s center of mass is separated from the geometric center, located closer to one of the solar panel sides. This asymmetry in the density creates robustness in the marker association for the initial and transient parts of the filtering simulation.
Note that in the estimation framework, initial uncertainties are accounted for by introducing random noise into all states. Specifically, a standard normal distribution is applied to these states, with varying levels of uncertainty to reflect their initial conditions, which are given in Table 2. For the position state, a normal distribution with a standard deviation of 1.0 is used, indicating relatively high uncertainty in the initial position estimation. The velocity state is subjected to a tighter distribution with a standard deviation of 1.0 × 10 1 . Similarly, the MRPs, representing the satellite’s attitude, are modeled with a normal distribution with a standard deviation of 1.0 × 10 2 , indicating moderately high uncertainty in the initial orientation. Lastly, the angular velocity state is assigned a standard deviation of 1.0 × 10 3 . The parameters for the UKF are selected as follows: α = 0.25 , β = 2.0 , and κ = 3.0 . These values are chosen based on standard guidelines for UKF performance and system dynamics. All other simulation-related parameters, such as the inertia tensor and mass of ENVISAT, are taken from the literature [50].
Our initial findings demonstrate promising outcomes in the detection and analysis of structural markers on the ENVISAT satellite through processed imagery. This approach not only enhances the robustness of our detection algorithms under varied operational scenarios but also aligns with the typical image quality captured by space-borne sensors. Each simulated image undergoes a preprocessing phase where Gaussian noise and blur are applied to prove the robustness and performance of the software. The addition of Gaussian noise is intended to mimic the electronic noise and sensor imperfections typically found in spacecraft imaging systems. This noise simulates the random variations in pixel intensity that occur due to various factors. Simultaneously, a Gaussian blur was applied to replicate the blurring effect caused by minute focusing discrepancies or motion effects that can occur in a spacecraft’s optical system [12]. Table 3 shows the camera and simulation parameters selected within this study.
Figure 13 demonstrates the preliminary results of a CNN algorithm. After marker identification, the measurements coming from the CNN algorithm are fed into the UKF framework as the measurements of the filter for the relative pose estimation. As a reminder, it is assumed that a priori knowledge of the geometrical and physical characteristics of both chaser and target is available, with rigid body dynamics, and that the chaser motion is deterministic and without external disturbances or control actions. The camera cannot detect all marker positions in a single frame because parts of ENVISAT’s structure obscure some markers. Consequently, the filter does not operate with the entire set of markers but adjusts its measurements frame-by-frame based on which markers are visible. The visibility and correct association of a corner to its corresponding marker are crucial, as locating more markers tends to enhance the estimation accuracy since it increases the observability of the system. Moreover, the presence of Gaussian noise and blur in the measurements further exacerbates the degradation in estimation quality. These factors introduce additional variability and error into the measurements, which in turn affect the overall accuracy of the position and orientation estimates.
In order to understand the effects of measurement noise and the number of visible markers, Monte Carlo runs were executed within the UKF framework. Note that in all Figure 14, Figure 15 and Figure 16, the red dashed lines represent the three sigma values from the updated error covariance matrix of the filter, the solid blue lines are the three sigma values obtained from the sampled errors of the Monte Carlo, and the gray lines are single Monte Carlo runs of the simulation. The effective (sampled) errors are computed as the root-sum-square (RSS) differences between the true states and the estimated mean states. Note that the effective error of the Monte Carlo simulation represents the actual error spread of the filter, calculated by running multiple simulations with different initial conditions. The estimated errors, derived from the covariance matrix, are computed as the square root of the sum of the diagonal elements corresponding to the filter states. The difference can be obviously seen in Figure 14 and Figure 15, where the standard deviation of the measurement noise (zero-mean Gaussian) for the sensors is increased from 0.02 to 0.2 . These values are selected to simulate mild to moderate noise on digital, simulated images. This dual impact of fewer markers and increased noise highlights the challenges in accurately determining the satellite’s state under suboptimal conditions. These results underscore the importance of maximizing the number of observed markers and minimizing noise to achieve high-quality estimation of the satellite’s relative position and orientation. Finally, in Figure 16, it is evident that with fewer markers observed throughout the time period, the estimation accuracy is also decreased.
Again, in Figure 14, Figure 15 and Figure 16, the red dashed lines represent the three-sigma values derived from the estimated error covariance matrix, and the blue lines show that the effective three-sigma values obtained from the sampled errors are close to each other and encapsulating the Monte Carlo errors. Consequently, the UKF has performed well, maintaining estimation accuracy despite the challenges posed by reduced marker observations and other uncertainties in the simulation, such as the measurement noise on the sensors. The filter is able to predict its own uncertainties, and the initial large error covariance is reduced rapidly to steady-state values. Indeed, to show convergence, the figures are zoomed around the steady state, while the initial large uncertainties are out of the graph (for visualization purposes). The error spikes are visible around some points, especially for the rotational position and velocity errors. The fundamental reason behind this phenomenon is the positioning of ENVISAT in the camera frame and its orientation, which reduces observability. It is important to state that the filter robustness and accuracy are sensitive to the filter frequency, which is set at 1 Hz for this study. The filter tends to perform less accurately below this frequency threshold, while it stays convergent above this frequency due to the sampling rate and discretization errors.

Robustness to Missing Observations: The Eclipse

For the ENVISAT satellite, the eclipse duration was calculated to be 35.57 min. This eclipse occurs when the satellite passes into the Earth’s shadow, during which the Sun’s light is blocked, and no direct sunlight reaches the satellite. Regardless of possible detection due to other light sources, it has been assumed to be a worst-case scenario where, during this eclipse period, the corners and markers of the satellite will never be visible to the cameras used for measurement. As a result, there will be no measurement updates in the UKF during this time. The UKF relies on visual tracking to update the satellite’s state estimates, and without sufficient visibility, the necessary data for these updates will be unavailable. Therefore, during the eclipse period, the estimation process will continue without measurement updates, relying solely on the prediction step to propagate the relative pose uncertainties. Figure 17 visualizes the eclipse phase where the Sun is inserted far away in the X axis.
In order to mitigate the impact of missing measurements and maintain estimation quality, we introduce and adjust the process noise covariance matrix Q during the eclipse, which starts around the 2000th s of the simulation and continues for approximately 35.57 min. While measurements are unavailable, the fidelity of the uncertainty estimate is merely based on the propagation step. In an ideal scenario, the predicted PDF, approximated as Gaussian by the UKF, perfectly describes the true non-Gaussian PDF, whose shape is unknown. In order to not lose custody of the target with an extreme confident success rate, the UKF predicted covariance needs to guarantee that the full uncertainties of the true PDF are enclosed in its prediction. This aspect is achieved through covariance inflation to ensure that the algorithm counteracts distributions with long tails and many outliers, which usually results from highly nonlinear dynamics (such as ENVISAT’s). The matrix Q reflects uncertainties in the system dynamics, and increasing its values during the eclipse according to the covariance-inflating methodology helps to ensure that the predicted uncertainty adequately encompasses the potential spread of the state. This inflation prevents the filter from becoming overconfident in its predictions and helps retain robustness under high nonlinearity and absent measurements. As the system is highly nonlinear, and the distribution becomes non-Gaussian, the noise inflation prevents degeneracy from outlier observations or states, providing robustness from every possible revelation. As shown in the results, the covariance inflation guarantees that every possible Monte Carlo simulation, not just in a 3 σ boundary sense, is enclosed in the uncertainty to guarantee that the UKF does not lose fidelity and custody of the target. Once the satellite exits the Earth’s shadow and the eclipse ends, measurement updates resume, and the UKF rapidly reduces accumulated prediction error by incorporating sensor data. To assess the filter’s consistency under these conditions, Monte Carlo simulations were conducted. Figure 18 compares estimated covariances for eclipse and no-eclipse cases, where solid blue lines denote three-sigma bounds from the estimated covariance under eclipse, red dashed lines represent the no-eclipse case, and gray lines show the Monte Carlo errors. In Figure 19, Figure 20, Figure 21 and Figure 22, red dashed lines show three-sigma values from the sampled errors, solid blue lines represent the three-sigma values from the estimated covariance, black lines show the mean error, and gray lines denote the Monte Carlo realizations.
The results demonstrate that the filter remains consistent and accurate in tracking the satellite’s state under nominal conditions. Notably, during the eclipse period, when no measurements are available because of the occlusion of satellite markers, an adjustment of the process noise covariance matrix is implemented. Despite the lack of direct measurement updates during the eclipse, the Monte Carlo errors remain within the expected bounds, indicating that the Q adaptation preserves the filter’s accuracy and robustness throughout the eclipse period. The matrix Q is a diagonal matrix in which each diagonal element represents the process noise for a specific state of the system. The first three values ( 0 , 0 , 0 ) correspond to position states and have negligible process noise. The next three values ( 2 × 10 14 ) are for the translational velocity states, reflecting low uncertainty. The following three values ( 8 × 10 14 ) are for the rotational states, represented by the MRPs, indicating higher uncertainty. The last three values ( 5 × 10 14 ) correspond to the rotational velocity states, with moderate uncertainty. This diagonal structure allows for independent control of the noise levels for each state. Table 4 shows the diagonal values of Q .
Beyond the eclipse situation, we also ran a Monte Carlo analysis for a case without any eclipse. This helped us establish a consistent point of comparison. It is available on Appendix A section of this study.

7. Discussion

This paper presents a robust approach to relative pose navigation for the tumbling ENVISAT spacecraft, emphasizing the online collaboration between a filter and a CNN-based image processing camera. As part of this study, we introduce TinyCornerNet, a lightweight CNN specifically designed for robust corner detection in spaceborne imagery. Unlike large-scale models, TinyCornerNet achieves high accuracy while maintaining computational efficiency, making it well-suited for onboard or real-time processing. Its architecture was optimized to reliably detect structural markers on satellite targets under various noise and distortion conditions. The results demonstrated that TinyCornerNet outperforms traditional corner detectors in both accuracy and robustness. In each cycle, the CNN processes the incoming images to extract corner measurements, which the filter then uses to predict and update the uncertainty of the system. Overall, the proposed pipeline not only addresses the complexity of autonomous rendezvous with a non-cooperative target but also lays a foundation for future advancements in space debris monitoring and active removal. Additionally, a key contribution of this work is the adjustment of the process noise covariance matrix via covariance inflation during eclipse periods. This adaptation ensures the continued accuracy of the UKF, allowing it to maintain state estimation despite the absence of direct sensor updates. Monte Carlo simulations, with and without eclipse scenarios, demonstrate the robustness and consistency of the proposed method. In eclipse conditions, the Q adjustment effectively compensates for the uncertainty introduced by the lack of measurements, keeping estimation errors within expected bounds. The reason is that Q represents the uncertainty in the system’s dynamics, specifically the degree to which the model’s predictions may diverge from the true state due to unmodeled/unknown or highly nonlinear dynamics, disturbances, or inaccuracies in the dynamic model. By increasing the values in Q during the eclipse, the filter acknowledges the higher spread of uncertainty and adjusts its predictions accordingly. A higher Q value increases the predicted state covariance, allowing the filter to reflect the potential range of error in its predictions more accurately. This prevents the filter from becoming overly confident in its predictions, thus mitigating the risk of large deviations between the predicted and true states. Once measurements are available again, the filter can quickly recover, reducing the accumulated uncertainty and maintaining overall consistency. The scenario without an eclipse confirms the filter’s nominal performance with continuous measurements. These findings suggest that the proposed approach offers a scalable, reliable solution for space debris management and contributes significantly to the advancement of autonomous de-orbiting systems.
In future work, we plan to integrate the CNN module directly within the UKF framework to further refine the pose estimation and tracking of space objects. This integration aims to overcome the limitations of current tracking systems by providing more accurate, rapid, and reliable state estimation under the complex and dynamic conditions of space environments. Moreover, the UKF will be improved by trying to implement an online adaptation of the measurement noise matrix rather than putting it into a constant matrix, which is reported to improve the quality of the estimation in different studies [51,52,53]. Future work should also focus on incorporating stochastic absolute values into the estimation process and enhancing the interaction between the camera system and the filter. Introducing stochastic elements can account for uncertainties and variabilities in absolute measurements, thus improving the robustness and accuracy of state estimation. Future work will involve using the full ENVISAT model with all its detailed features (solar panels, body panels, etc.). We will test the corner detection performance under these more realistic conditions to better evaluate the robustness of our method. Additionally, optimizing the interaction between the camera system and the filter can lead to more efficient processing, allowing the filter to guide the camera in focusing on specific regions of interest, thus reducing computational load and improving real-time performance.

Author Contributions

Conceptualization, B.C. and S.S.; methodology, B.C.; software, B.C.; validation, B.C.; formal analysis, B.C.; investigation, B.C.; resources, S.S.; data curation, B.C.; writing—original draft preparation, B.C.; writing—review and editing, B.C. and S.S.; visualization, B.C.; supervision, S.S.; project administration, S.S.; funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Servadio, Iowa State University start-up fund.

Data Availability Statement

The data presented in this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.15377547.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In addition to the eclipse scenario, a Monte Carlo analysis was performed for a scenario where no eclipse occurs, providing a baseline consistency check. In this case, the satellite’s markers remain visible throughout the simulation, allowing for continuous measurement updates in the UKF. The Monte Carlo errors, sampled error covariance three-sigma bounds, and estimated covariance three-sigma bounds are reported here. The results show that the filter consistently tracks the satellite’s state, with the errors remaining within the expected three-sigma bounds. Since no interruptions in measurements occur, the need for adjustments to the process noise covariance matrix is minimized, and the filter operates with nominal performance. This analysis demonstrates the filter’s accuracy levels, consistency, and robustness for the nominal condition. In Figure A1, Figure A2, Figure A3 and Figure A4, the red dashed lines represent the effective three-sigma values obtained from the sampled errors, while the solid blue lines represent the three-sigma values derived from the estimated error covariance matrix, and the gray lines are the Monte Carlo errors for no eclipse case.
Figure A1. UKF Monte Carlo analysis for position estimation without eclipse.
Figure A1. UKF Monte Carlo analysis for position estimation without eclipse.
Aerospace 12 00425 g0a1
Figure A2. UKF Monte Carlo analysis for velocity estimation without eclipse.
Figure A2. UKF Monte Carlo analysis for velocity estimation without eclipse.
Aerospace 12 00425 g0a2
Figure A3. UKF Monte Carlo analysis for angular position estimation without eclipse.
Figure A3. UKF Monte Carlo analysis for angular position estimation without eclipse.
Aerospace 12 00425 g0a3
Figure A4. UKF Monte Carlo analysis for angular velocity estimation without eclipse.
Figure A4. UKF Monte Carlo analysis for angular velocity estimation without eclipse.
Aerospace 12 00425 g0a4

References

  1. Starek, J.A.; Açıkmeşe, B.; Nesnas, I.A.; Pavone, M. Spacecraft Autonomy Challenges for Next-Generation Space Missions. In Advances in Control System Technology for Aerospace Applications; Feron, E., Ed.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–48. [Google Scholar] [CrossRef]
  2. Song, J.; Rondao, D.; Aouf, N. Deep learning-based spacecraft relative navigation methods: A survey. Acta Astronaut. 2022, 191, 22–40. [Google Scholar] [CrossRef]
  3. Sharma, S.; Beierle, C.; D’Amico, S. Pose estimation for non-cooperative spacecraft rendezvous using convolutional neural networks. In Proceedings of the 2018 IEEE Aerospace Conference, Big Sky, MT, USA, 3–10 March 2018; pp. 1–12. [Google Scholar] [CrossRef]
  4. Trajković, M.; Hedley, M. Fast corner detection. Image Vis. Comput. 1998, 16, 75–87. [Google Scholar] [CrossRef]
  5. Shi, J.; Tomasi. Good features to track. In Proceedings of the 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar] [CrossRef]
  6. Capuano, V.; Alimo, S.R.; Ho, A.Q.; Chung, S.J. Robust Features Extraction for On-board Monocular-based Spacecraft Pose Acquisition. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar] [CrossRef]
  7. Sharma, S.; Ventura, J.; D’Amico, S. Robust Model-Based Monocular Pose Initialization for Noncooperative Spacecraft Rendezvous. J. Spacecr. Rocket. 2018, 55, 1414–1429. [Google Scholar] [CrossRef]
  8. Sharma, S.; D’Amico, S. Comparative assessment of techniques for initial pose estimation using monocular vision. Acta Astronaut. 2016, 123, 435–445. [Google Scholar] [CrossRef]
  9. Zhang, G.; Kontitsis, M.; Filipe, N.; Tsiotras, P.; Vela, P.A. Cooperative Relative Navigation for Space Rendezvous and Proximity Operations using Controlled Active Vision. J. Field Robot. 2016, 33, 205–228. [Google Scholar] [CrossRef]
  10. Liu, C.; Hu, W. Relative pose estimation for cylinder-shaped spacecrafts using single image. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 3036–3056. [Google Scholar] [CrossRef]
  11. Shi, J.F.; Ulrich, S. Uncooperative Spacecraft Pose Estimation Using Monocular Monochromatic Images. J. Spacecr. Rocket. 2021, 58, 284–301. [Google Scholar] [CrossRef]
  12. Pauly, L.; Rharbaoui, W.; Shneider, C.; Rathinam, A.; Gaudillière, V.; Aouada, D. A survey on deep learning-based monocular spacecraft pose estimation: Current state, limitations and prospects. Acta Astronaut. 2023, 212, 339–360. [Google Scholar] [CrossRef]
  13. Pasqualetto Cassinis, L.; Fonod, R.; Gill, E. Review of the robustness and applicability of monocular pose estimation systems for relative navigation with an uncooperative spacecraft. Prog. Aerosp. Sci. 2019, 110, 100548. [Google Scholar] [CrossRef]
  14. Chen, B.; Cao, J.; Parra, A.; Chin, T.J. Satellite Pose Estimation with Deep Landmark Regression and Nonlinear Pose Refinement. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 2816–2824. [Google Scholar] [CrossRef]
  15. Huo, Y.; Li, Z.; Zhang, F. Fast and Accurate Spacecraft Pose Estimation From Single Shot Space Imagery Using Box Reliability and Keypoints Existence Judgments. IEEE Access 2020, 8, 216283–216297. [Google Scholar] [CrossRef]
  16. Huan, W.; Liu, M.; Hu, Q. Pose Estimation for Non-cooperative Spacecraft based on Deep Learning. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 3339–3343. [Google Scholar] [CrossRef]
  17. Price, A.; Yoshida, K. A Monocular Pose Estimation Case Study: The Hayabusa2 Minerva-II2 Deployment. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 1992–2001. [Google Scholar] [CrossRef]
  18. Legrand, A.; Detry, R.; De Vleeschouwer, C. End-to-end Neural Estimation of Spacecraft Pose with Intermediate Detection of Keypoints. In Proceedings of the Computer Vision—ECCV 2022 Workshops; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer: Cham, Switzerland, 2023; pp. 154–169. [Google Scholar]
  19. Hu, Y.; Hugonot, J.; Fua, P.; Salzmann, M. Segmentation-Driven 6D Object Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 15–20 June 2019; pp. 3380–3389. [Google Scholar] [CrossRef]
  20. Cosmas, K.; Kenichi, A. Utilization of FPGA for Onboard Inference of Landmark Localization in CNN-Based Spacecraft Pose Estimation. Aerospace 2020, 7, 159. [Google Scholar] [CrossRef]
  21. Piazza, M.; Maestrini, M.; Di Lizia, P. Monocular Relative Pose Estimation Pipeline for Uncooperative Resident Space Objects. J. Aerosp. Inf. Syst. 2022, 19, 613–632. [Google Scholar] [CrossRef]
  22. Bechini, M.; Gu, G.; Lunghi, P.; Lavagna, M. Robust spacecraft relative pose estimation via CNN-aided line segments detection in monocular images. Acta Astronaut. 2024, 215, 20–43. [Google Scholar] [CrossRef]
  23. Kim, S.G.; Crassidis, J.L.; Cheng, Y.; Fosbury, A.M.; Junkins, J.L. Kalman Filtering for Relative Spacecraft Attitude and Position Estimation. J. Guid. Control. Dyn. 2007, 30, 133–143. [Google Scholar] [CrossRef]
  24. Cavenago, F.; Di Lizia, P.; Massari, M.; Wittig, A. On-board spacecraft relative pose estimation with high-order extended Kalman filter. Acta Astronaut. 2019, 158, 55–67. [Google Scholar] [CrossRef]
  25. Servadio, S.; Zanetti, R. Estimation of the Conditional State and Covariance With Taylor Polynomials. J. Adv. Inf. Fusion 2021, 6, 126–142. [Google Scholar]
  26. Julier, S.; Uhlmann, J. Unscented filtering and nonlinear estimation. Proc. IEEE 2004, 92, 401–422. [Google Scholar] [CrossRef]
  27. Servadio, S.; Cavenago, F.; Di Lizia, P.; Massari, M. Nonlinear Prediction in Marker-Based Spacecraft Pose Estimation with Polynomial Transition Maps. J. Spacecr. Rocket. 2022, 59, 511–523. [Google Scholar] [CrossRef]
  28. Servadio, S.; Parker, W.; Linares, R. Uncertainty Propagation and Filtering via the Koopman Operator in Astrodynamics. J. Spacecr. Rocket. 2023, 60, 1639–1655. [Google Scholar] [CrossRef]
  29. Zhou, X.; Qiao, D.; Li, X. Neural Network-Based Method for Orbit Uncertainty Propagation and Estimation. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 1176–1193. [Google Scholar] [CrossRef]
  30. Zhou, X.; Qiao, D.; Li, X. Adaptive Order-Switching Kalman Filter for Orbit Determination Using Deep-Neural-Network-Based Nonlinearity Detection. J. Spacecr. Rocket. 2023, 60, 1724–1741. [Google Scholar] [CrossRef]
  31. Candan, B.; Servadio, S. Markers Identification for Relative Pose Estimation of an Uncooperative Target. In Proceedings of the AAS/AIAA Astrodynamics Specialist Conference, Broomfield, CO, USA, 11–15 August 2024. [Google Scholar] [CrossRef]
  32. Klinkrad, H. Space Debris: Models and Risk Analysis; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  33. Liou, J.; Johnson, N.L. Risks in Space from Orbiting Debris. Science 2006, 311, 340–341. [Google Scholar] [CrossRef]
  34. Servadio, S.; Jang, D.; Linares, R. Threat Level Estimation From Possible Break-Up Events In LEO. In Proceedings of the AIAA SCITECH 2024 Forum, Orlando, FL, USA, 8–12 January 2024. [Google Scholar] [CrossRef]
  35. Servadio, S.; Simha, N.; Gusmini, D.; Jang, D.; St. Francis, T.; D’Ambrosio, A.; Lavezzi, G.; Linares, R. Risk Index for the Optimal Ranking of Active Debris Removal Targets. J. Spacecr. Rocket. 2024, 61, 407–420. [Google Scholar] [CrossRef]
  36. Mejía–Kaiser, M. IADC Space Debris Mitigation Guidelines; UNOOSA: Vienna, Austria, 2020; pp. 381–389. [Google Scholar]
  37. Servadio, S. High Order Filters for Relative Pose Estimation of an Uncooperative Target. Ph.D. Thesis, Politecnico di Milano, Milan, Italy, 2017. [Google Scholar]
  38. European Space Agency (ESA). Envisat Mission Description. 2025. Available online: https://earth.esa.int/eogateway/missions/envisat/description (accessed on 28 April 2025).
  39. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
  40. Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. [Google Scholar] [CrossRef]
  41. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.Q.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
  42. Tabian, I.; Fu, H.; Sharif Khodaei, Z. A Convolutional Neural Network for Impact Detection and Characterization of Complex Composite Structures. Sensors 2019, 19, 4933. [Google Scholar] [CrossRef]
  43. Ajit, A.; Acharya, K.; Samanta, A. A Review of Convolutional Neural Networks. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
  44. Zhou, Y.; Qi, H.; Ma, Y. End-to-End Wireframe Parsing. In Proceedings of the ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  45. Harris, C.G.; Stephens, M.J. A Combined Corner and Edge Detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988. [Google Scholar]
  46. Servadio, S.; Zanetti, R. Recursive polynomial minimum mean-square error estimation with applications to orbit determination. J. Guid. Control Dyn. 2020, 43, 939–954. [Google Scholar] [CrossRef]
  47. Cavenago, F.; Di Lizia, P.; Massari, M.; Servadio, S.; Wittig, A. DA-based nonlinear filters for spacecraft relative state estimation. In Proceedings of the 2018 Space Flight Mechanics Meeting, Kissimmee, FL, USA, 8–12 January 2018. [Google Scholar] [CrossRef]
  48. Servadio, S.; Zanetti, R.; Jones, B.A. Nonlinear filtering with a polynomial series of Gaussian random variables. IEEE Trans. Aerosp. Electron. Syst. 2020, 57, 647–658. [Google Scholar] [CrossRef]
  49. Corke, P. Robotics, Vision and Control: Fundamental Algorithms in MATLAB; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  50. Gómez, N.O.; Walker, S.J. Earth’s gravity gradient and eddy currents effects on the rotational dynamics of space debris objects: Envisat case study. Adv. Space Res. 2015, 56, 494–508. [Google Scholar] [CrossRef]
  51. Candan, B.; Soken, H.E. Robust Attitude Estimation Using IMU-Only Measurements. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
  52. Soken, H.E.; Hajiyev, C. UKF-Based Reconfigurable Attitude Parameters Estimation and Magnetometer Calibration. IEEE Trans. Aerosp. Electron. Syst. 2012, 48, 2614–2627. [Google Scholar] [CrossRef]
  53. Candan, B.; Soken, H.E. Robust Attitude Estimation Using Magnetic and Inertial Sensors. IFAC-PapersOnLine 2023, 56, 4502–4507. [Google Scholar] [CrossRef]
Figure 1. Visual representation of the elements of translational dynamics.
Figure 1. Visual representation of the elements of translational dynamics.
Aerospace 12 00425 g001
Figure 2. ENVISAT satellite with its geometrical dimensions.
Figure 2. ENVISAT satellite with its geometrical dimensions.
Aerospace 12 00425 g002
Figure 3. Simulated images of ENVISAT satellite.
Figure 3. Simulated images of ENVISAT satellite.
Aerospace 12 00425 g003
Figure 4. Visual summary of the CNN architecture.
Figure 4. Visual summary of the CNN architecture.
Aerospace 12 00425 g004
Figure 5. Overview of TinyCornerNET architecture.
Figure 5. Overview of TinyCornerNET architecture.
Aerospace 12 00425 g005
Figure 6. Histogram comparison of TinyCornerNET with traditional methods.
Figure 6. Histogram comparison of TinyCornerNET with traditional methods.
Aerospace 12 00425 g006
Figure 7. Camera transformation from world coordinates to pixel coordinates.
Figure 7. Camera transformation from world coordinates to pixel coordinates.
Aerospace 12 00425 g007
Figure 8. Camera transformation with image plane center ( c x , c y ) illustrated.
Figure 8. Camera transformation with image plane center ( c x , c y ) illustrated.
Aerospace 12 00425 g008
Figure 9. Arrangement of the vectors for the measurement equation.
Figure 9. Arrangement of the vectors for the measurement equation.
Aerospace 12 00425 g009
Figure 10. CNN corner detection spread under noise.
Figure 10. CNN corner detection spread under noise.
Aerospace 12 00425 g010
Figure 11. Marker association process for CNN-detected corners.
Figure 11. Marker association process for CNN-detected corners.
Aerospace 12 00425 g011
Figure 12. End-to-end pipeline of overall methodology.
Figure 12. End-to-end pipeline of overall methodology.
Aerospace 12 00425 g012
Figure 13. Preliminary results of the CNN corner detection algorithm.
Figure 13. Preliminary results of the CNN corner detection algorithm.
Aerospace 12 00425 g013
Figure 14. UKF Monte Carlo results for σ = 0.02 noise on sensors.
Figure 14. UKF Monte Carlo results for σ = 0.02 noise on sensors.
Aerospace 12 00425 g014
Figure 15. UKF Monte Carlo results for σ = 0.2 noise on sensors.
Figure 15. UKF Monte Carlo results for σ = 0.2 noise on sensors.
Aerospace 12 00425 g015
Figure 16. UKF Monte Carlo results for σ = 0.02 noise on sensors with only 3 markers visible.
Figure 16. UKF Monte Carlo results for σ = 0.02 noise on sensors with only 3 markers visible.
Aerospace 12 00425 g016
Figure 17. The orbit of ENVISAT around the Earth.
Figure 17. The orbit of ENVISAT around the Earth.
Aerospace 12 00425 g017
Figure 18. UKF RMSE Monte Carlo analysis for angular velocity estimation with eclipse.
Figure 18. UKF RMSE Monte Carlo analysis for angular velocity estimation with eclipse.
Aerospace 12 00425 g018
Figure 19. UKF Monte Carlo analysis for position estimation with eclipse.
Figure 19. UKF Monte Carlo analysis for position estimation with eclipse.
Aerospace 12 00425 g019
Figure 20. UKF Monte Carlo analysis for velocity estimation with eclipse.
Figure 20. UKF Monte Carlo analysis for velocity estimation with eclipse.
Aerospace 12 00425 g020
Figure 21. UKF Monte Carlo analysis for angular position estimation with eclipse.
Figure 21. UKF Monte Carlo analysis for angular position estimation with eclipse.
Aerospace 12 00425 g021
Figure 22. UKF Monte Carlo analysis for angular velocity estimation with eclipse.
Figure 22. UKF Monte Carlo analysis for angular velocity estimation with eclipse.
Aerospace 12 00425 g022
Table 1. Architecture of TinyCornerNET for marker detection.
Table 1. Architecture of TinyCornerNET for marker detection.
Layer TypeOutput SizeParameters
Input image 1280 × 1280 × 1 -
Conv1 (ReLU) 1280 × 1280 × 16 3 × 3 filter, 16 filters
Conv2 (ReLU) 1280 × 1280 × 32 3 × 3 filter, 32 filters
Conv3 (ReLU) 1280 × 1280 × 64 3 × 3 filter, 64 filters
Conv4 (Sigmoid) 1280 × 1280 × 1 1 × 1 filter, 1 filter
Table 2. Initial conditions.
Table 2. Initial conditions.
Translational DynamicsRotational Dynamics
x (m)−0.002 ϕ (roll,rad)1.66
y (m)−31.17 θ (pitch,rad)2.27
z (m)0 ψ (yaw,rad)−0.38
x ˙ (m/s)−3.5 × 10−6 ω x (rad/s)0.02
y ˙ (m/s)−2.0 × 10−6 ω y (rad/s)0.02
z ˙ (m/s)0 ω z (rad/s)0.04
Table 3. Camera intrinsic properties and simulation parameters.
Table 3. Camera intrinsic properties and simulation parameters.
ParameterValue
F o V (Field of view)45 (degree)
f x (Focal length in x-direction)1280 (pxs)
f y (Focal length in y-direction)1280 (pxs)
c x (Principal point x-coordinate)640 (pxs)
c y (Principal point y-coordinate)640 (pxs)
Affine translation (std. dev.)3 pixels (in both x and y)
Affine rotation (std. dev.)1 (degree)
Table 4. Diagonal values of the Q matrix, categorized by position, velocity, angular position, and angular velocity blue used during the eclipse.
Table 4. Diagonal values of the Q matrix, categorized by position, velocity, angular position, and angular velocity blue used during the eclipse.
StateStandard Deviation
Position (x, y, z)0, 0, 0 (m)
Velocity ( v x , v y , v z ) 2 × 10 14 , 2 × 10 14 , 2 × 10 14 (m/s)
Angular Position ( r x , r y , r z ) 8 × 10 14 , 8 × 10 14 , 8 × 10 14 (rad)
Angular Velocity ( ω x , ω y , ω z ) 5 × 10 14 , 5 × 10 14 , 5 × 10 14 (rad/s)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Candan, B.; Servadio, S. Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection. Aerospace 2025, 12, 425. https://doi.org/10.3390/aerospace12050425

AMA Style

Candan B, Servadio S. Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection. Aerospace. 2025; 12(5):425. https://doi.org/10.3390/aerospace12050425

Chicago/Turabian Style

Candan, Batu, and Simone Servadio. 2025. "Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection" Aerospace 12, no. 5: 425. https://doi.org/10.3390/aerospace12050425

APA Style

Candan, B., & Servadio, S. (2025). Relative Pose Estimation of an Uncooperative Target with Camera Marker Detection. Aerospace, 12(5), 425. https://doi.org/10.3390/aerospace12050425

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop