1. Introduction
The escalating problem of space debris necessitates innovative solutions for identification and removal with a high level of autonomy [
1], particularly for large defunct satellites such as ESA’s ENVISAT, an inactive Earth observation satellite. In the past ten years, deep learning (DL) has profoundly influenced the development of computer vision algorithms, enhancing their performance and robustness in various applications such as image classification, segmentation, and object tracking. This momentum has carried into spacecraft pose estimation, where DL-based methods have begun to surpass traditional feature-engineering techniques reported in the literature [
2,
3], corner and marker detection algorithms such as FAST, Shi-Tomasi, and Hough transform methods [
4,
5,
6,
7]. CNNs have shown promising results in spacecraft pose estimation tasks, particularly for non-cooperative targets. Previous studies have utilized a variety of techniques, including perspective-n-point (PnP) algorithms [
8], blob detection methods [
9], and region-based methods [
10,
11], to estimate spacecraft poses from monocular images. These methods, while effective under controlled conditions, often struggle with challenges such as poor lighting, sensor noise, and complex geometries typical of space-based imagery [
12]. Generally, the scarcity of extensive synthetic space image datasets, crucial for comprehensive CNN training, necessitates the use of networks pre-trained on terrestrial images. To adapt these for space applications, transfer learning is used, focusing on training only selected layers of the CNN [
13]. Detection of key points such as corners involves predicting the 2D projections of specific 3D key points from the spacecraft’s imaged segments using a deep learning model. These key points are usually determined by the CAD model of the spacecraft. In the absence of a CAD model, methods like multi-view triangulation or structure from motion are employed to create a 3D wire frame model that includes these key points [
14,
15]. The key point location regression technique involves direct estimation of key point positions [
16,
17]. The segmentation-driven method utilizes a network with dual functions, segmentation and regression, to deduce key point locations. The image is sectioned into a grid, isolating the spacecraft within certain grid cells. The positions of the key points are then computed as offsets within these identified cells, which improves the prediction accuracy. Variations of this model have been optimized for space deployment, reducing the parameter count without compromising the accuracy of prediction [
18,
19]. The heat map prediction method represents the likelihood of key point locations from which the highest probability points are extracted as the actual locations. Moreover, there is a bounding box approach using the enclosed bounding boxes over the key points that are predicted along with confidence scores rather than using the key point locations or heat maps [
20,
21]. Recent work [
22] has highlighted the limitations of feature-based approaches and demonstrated how CNNs can overcome some of these challenges, improving robustness against sensor noise and environmental variations. In this study, we expand on these efforts by leveraging CNNs to detect structural markers on the ENVISAT satellite, which is crucial for autonomous space debris removal and de-orbiting operations.
On the other hand, the estimation algorithms are an integral part of determining the spacecraft pose, and they involve deducing the value of a desired quantity from indirect, imprecise, and noisy observations. When this quantity is the current state of a dynamic system, the process is termed as the best estimate obtained from the measurements. Moreover, estimation of the relative position and the target attitude are crucial for safe proximity operations. This necessitates complex on-board computations at a frequency ensuring accuracy requirements are met. However, the limited computational power of current processors constrains the estimation processes. Therefore, developing efficient algorithms that minimize computational demands while maintaining necessary performance and reaching the best estimate are crucial for the sake of these missions. This estimate is produced by an optimal estimator, a computational algorithm, effectively utilizing available data, system knowledge, and disturbance information. For linear and Gaussian scenarios, the posterior distribution remains Gaussian, and the Kalman Filter (KF) is used to compute its mean and covariance matrix. However, practical problems are often nonlinear, leading to non-Gaussian probability density functions (PDFs). Various techniques address non-linear estimation problems in space applications. One straightforward method is to linearize the dynamics and measurement equations around the current estimate, as done in the Extended Kalman Filter (EKF) [
23], which applies KF mechanics to a linearized system. Higher-order Taylor series approximations can extend the EKF’s first-order approximation. The Gaussian Second-Order Filter (GSOF), for example, truncates the Taylor series at the second order to better handle system nonlinearity [
24]. This requires knowledge of the estimation error’s central moments up to the fourth order to calculate the Kalman gain. For instance, while the EKF uses first-order truncation requiring covariance matrices, the GSOF requires third and fourth central moments of the state distribution. The GSOF approximates the prior PDF as Gaussian at each iteration and performs a linear update based on a second-order approximation of the posterior estimation error. Other linear filters use different approximations, such as Gaussian quadrature, spherical cubature, ensemble points, central differences, and finite differences [
25]. Alternatively, the Unscented Kalman Filter (UKF) employs the unscented transformation to better manage nonlinearity in dynamics and measurements, typically achieving higher accuracy and robustness than the EKF. The UKF uses this transformation for a more precise approximation of the predicted mean and covariance matrix, remaining a linear estimator where the estimate is a linear function of the current measurement [
26]. The UKF offers a solution by using the unscented transformation, which avoids linearization by propagating carefully selected sample points through the nonlinear system, providing superior performance in such situations [
27,
28]. Recently, even the use of artificial intelligence methods for non-linear estimation has also been reported in the literature [
29,
30].
In this paper, we address a novel approach using image processing from a chaser spacecraft to detect structural markers on the ENVISAT satellite, as well as an estimation framework utilizing UKF to facilitate its safe de-orbiting. Utilizing high-resolution imagery, this study employs our own CNN, TinyCornerNET, designed for precise marker detection, essential for the subsequent removal process. The methodology incorporates image preprocessing, including noise addition, to enhance feature detection accuracy under varying space conditions. Each image taken by the camera is processed via the CNN, which provides corner measurements to the filter, and the filter predicts the required relative pose states. This is challenging due to the tumbling motion of ENVISAT. Until this point, there is no merged solution for this problem; therefore, we propose a full pipeline for the relative pose navigation system, where the relative position and velocity states are estimated for both translational and angular components. To enhance the robustness and to propose a realistic scenario, this work introduces a tuning process for noise covariance matrix (
) adjustment during eclipse periods, when the satellite’s corner markers become occluded and measurements are unavailable, which was not studied in recent works [
31]. This approach ensures that the state estimation remains robust by accounting for increased uncertainty during these periods of no visibility, thus improving the accuracy of both translational and rotational state predictions. By adjusting
during eclipse periods, the UKF can effectively mitigate the risk of estimation divergence, maintaining system stability and ensuring accurate tracking even in the absence of real-time measurements. Preliminary results demonstrate the system’s efficacy in identifying corner points on the satellite and its ability to maintain translational and rotational estimates within appropriate levels, promising a significant leap forward in automated space debris removal technologies. This work builds on recent advances in space debris monitoring and removal strategies, echoing the urgent call for action highlighted in studies such as [
32,
33,
34], which emphasize the growing threat of space debris and the need for effective removal mechanisms [
35]. Our findings indicate a scalable solution for debris management, aligned with the proactive strategies recommended by the Inter-Agency Space Debris Coordination Committee (IADC) [
36].
2. Methodology
In this section, the methodology underlying the proposed approach is detailed. It covers the dynamics governing both the chaser and the target spacecraft, the simulation setup for the ENVISAT satellite, and the process of preparing the data for image-based measurements and estimation.
2.1. Dynamics
In the following analysis, several assumptions are made to represent the dynamics modeling the chaser and target spacecraft. Firstly, it is presumed that the inertia properties of both the chaser and the target are perfectly known beforehand. This assumption simplifies the dynamic modeling by removing uncertainties related to mass distribution and moments of inertia. Secondly, the absolute motion of the chaser is considered deterministic. Lastly, neither flexible dynamics nor external disturbances are taken into account. This implies that the translational and rotational dynamics are decoupled. These assumptions streamline the analysis by focusing on the primary dynamics without uncertain factors. However, they are validated by the knowledge of the scenario, as ENVISAT’s models are known and available to the public. Therefore, the motion of the chaser and the target can be represented as follows.
2.1.1. Absolute Chaser Motion
The chaser motion is described by the following equations, where
is the gravitational parameter of the Earth,
is the position of the chaser center of mass with respect to the Earth, and
is the true anomaly in the orbit of the chaser.
2.1.2. Relative Translational Dynamics
The target, ENVISAT, has its relative translational dynamic equations developed with respect to the local-vertical–local-horizontal (LVLH) frame of the chaser. The target relative position, denoted as
, and relative velocity,
, are defined in the chaser LVLH frame, expressed as follows:
In this context,
x,
y, and
z are the components of the position vector
within the chaser LVLH frame, and
,
, and
represent the respective unit vectors of the reference frame. Thus, the equations of relative motion of the target can be written in the following way:
Figure 1 gives a visual representation of the translational dynamics [
37].
2.1.3. Rotational Dynamics
The relative orientation of the body-fixed frame on the target with respect to the body-fixed frame of the chaser can be represented with a rotational matrix
. Thus, the relative angular velocity
in the target body-fixed reference frame is dependent on the angular velocity of the chaser
and the angular velocity of the target
, both represented in their own body-fixed reference frames.
The relative attitude of the target is established by the parametrization of the rotation matrix
. The Modified Rodriguez Parameters (MRPs) are utilized as the minimal set of three parameters that allow it to overcome singularities and to describe rotation. The quaternions representing the orientation of ENVISAT can be expressed as
, which consists of four components: a scalar one
, and
,
, and
. The three-axis angular velocity of ENVISAT is represented by
, with components
,
, and
.
where
is the three-dimensional MRP vector. The kinematic equation of motion is able to be derived using the target’s relative angular velocity. Therefore, the time evolution of the MRP is described as follows.
Note that
is a
identity matrix, and
identifies the
cross product matrix, given as follows:
The rotation matrix
that connects the chaser body-fixed frame and the target body-fixed frame can thus be derived as follows:
The torque-free Euler equations describe the absolute rotational dynamics of the chaser. The relative attitude dynamics are obtained by substituting the kinematics relationship in the Euler absolute equations of the target spacecraft.
Note that
is the matrix of inertia of the target,
is the apparent torques,
is the gyroscopic torques, and
is the chaser-inertial torques, defined as follows:
2.2. ENVISAT Satellite Simulation
Detailed models of the satellite’s orientation and movement patterns, replicating the complex dynamics encountered in orbit, are simulated in this section. By leveraging computational tools, we were able to create accurate ground truth data for both the pose and marker locations on the satellite by propagating the equations of motion of the chaser and target satellites using the Runge–Kutta–Fehlberg method. This simulation process was critical for establishing a reliable dataset that mirrors the real-world conditions the satellite would experience in space.
Figure 2 visualizes the ENVISAT satellite and its geometrical dimensions used within the simulation environment (commas are decimal separators). The European Space Agency online sources provide details about ENVISAT’s mass, the location of its center of mass, and its moments of inertia, as well as its geometric center and volume [
38].
Figure 3 shows the series of simulated images for the ENVISAT satellite as follows. Note that the solar panel is not included in the simulation, but the main rectangular body is modeled for assessing the accuracy of the corner detection algorithm and verification. The state vector that has 12 components (the relative position between target and chaser centers of mass, relative velocity of the center of mass, MRPs for attitude, and angular velocities) and the true measurement of all the corner locations (which would be the ground truth for the corner detection algorithm) are stored after each simulation.
2.3. Data Preparation
Upon completion of the simulation, the generated images, which encapsulate the precise pose and marker information of the satellite, were transferred to a Python environment for further processing. The transition from MATLAB R2024b to Python 3.13.3 was essential for integrating the simulation data with the image processing and machine learning pipeline that follows. In Python, one of the initial steps involved converting the RGB images obtained from the simulation into gray-scale. This conversion is pivotal for the subsequent corner detection process, as working with gray-scale images reduces computational complexity and focuses the analysis on the structural information relevant to marker detection. By eliminating the color data, we can enhance the efficiency and accuracy of the CNN in identifying the critical corner locations on the satellite. The streamlined dataset, now optimized for neural network processing, sets the stage for the effective application of machine learning techniques in detecting the markers essential for the satellite’s pose estimation and debris removal strategy.
5. The Measurement Model
The observations provided to the filter use a measurement model that combines the translational and rotational states. However, the propagation of dynamics can still be separated into translational and rotational components, resulting in faster and more efficient estimation of relative translational states and relative velocities and relative rotational states and angular velocities. This approach allows the state vector, although 12 components long, to be divided into two separate parts of 6 components each, propagating the translational and rotational models in parallel. The filter uses a 4th-order Runge–Kutta integrator for propagation in order to assess the onboard application. Before beginning the estimation, the filter uses information from the previous step to calculate the marker visibility. Depending on the simulation requirements, the filter can operate with the entire set of markers or limit measurements to three markers, with the selection process explained later. Additionally, measurement failures can be included in the simulation to show the robustness of the proposed technique. Finally, the estimated relative states are compared with the true states, propagated with high accuracy, to evaluate the performance of the filter.
5.1. Marker Creation
Filters require accurate measurements to effectively correct the predicted values and accurately determine the target’s attitude. Most filters for space applications depend on camera image processing. In practical scenarios, the image processing software is configured to identify target points in each captured image, known as markers. The software processes the camera image, identifies the marker positions, and then transmits this information to the filter. Selecting markers is a complex task influenced by the target’s shape, volume, and color, as the image processing must be rapid. Commonly, target corners are selected as markers, utilizing reliable corner detection algorithms like the Harris–Stephens [
45] and Förstner algorithms. Effective interaction between the filter and the image processing software is crucial. After an initial period during which the first measurements are taken and the position error rapidly decreases, the communication should be optimized to expedite marker estimation. Once the filter completes its iterative cycle, it can inform the camera where to search for markers in the subsequent image. This enables the camera software to analyze a smaller image region, reducing the need to process all pixels and focusing only on those near the predicted marker positions. In the context of the ENVISAT relative pose estimation problem, it was decided to use the corners of the main body as markers and track their positions over time. Consequently, the marker positions can offer valuable information since the position of each marker is well known with respect to the center of mass. By tracking the trajectory of these markers, the filter can reconstruct the state of the spacecraft and calculate its relative position and velocity. The main body of ENVISAT, excluding the solar panel, can be modeled as a simple parallelepiped with eight corners. These corners are selected as the filter markers, and their positions relative to the center of mass are known. Each marker is identified by a letter, resulting in markers labeled A, B, C, D, E, F, G, and H, as reported in the previous studies [
37].
5.2. Camera Transformation and 3D Reconstruction from 2D Pixel Data
To accurately estimate the 3D positions of the markers on the satellite, it is necessary to transform the 2D pixel coordinates detected by the CNN module into 3D space. This transformation is achieved using both the intrinsic and extrinsic camera parameters.
Figure 7 visually demonstrates how the camera transformation works from world coordinates to pixel coordinates. Note that the selected camera model is a pinhole camera model [
49].
In the figure, , , and are the world coordinates of the satellite corners; , , and are the coordinates of the satellite’s corners on the camera frame; and are the coordinates of the satellite’s corners on the image frame; and u and v are the pixel locations to be detected by the CNN.
5.2.1. Intrinsic Camera Parameters
The intrinsic parameters describe the internal characteristics of the camera, including its focal length, principal point, and pixel scaling. These parameters are represented by the camera intrinsic matrix
, which is a
matrix that maps the 3D coordinates of a point in the camera’s coordinate system to 2D image plane coordinates. The intrinsic matrix is defined as:
where
and
are the focal lengths in the x and y directions, respectively. Parameters
and
represent half the width and height of the image plane, respectively. In the pinhole camera model, they are used to ensure that the
u and
v coordinates are always positive. This transforms the image coordinate system to have a domain of
and
for
u and
v, respectively, rather than
and
. This matrix is applied to the 3D points in camera space to project them onto the 2D image plane, producing pixel coordinates. The CNN module detects these pixel locations for the satellite’s corners.
5.2.2. Extrinsic Camera Parameters
The extrinsic parameters define the position and orientation of the camera in the world coordinate system. These parameters are encapsulated in the extrinsic transformation matrix
, where
is a
rotation matrix (from the MRPs) and
is a
translation vector. The extrinsic matrix transforms the 3D world coordinates of the satellite into the camera’s coordinate system:
Therefore, the transformation from world coordinates
to camera coordinates
is given by:
5.2.3. Conversion from 2D Pixel Coordinates to 3D World Coordinates
To reconstruct the 3D positions of the satellite’s markers from 2D image coordinates, a standard camera transformation process is employed. Starting from the 2D pixel coordinates
detected by the CNN module, the following transformation is applied. Using the intrinsic camera matrix
, the 2D pixel coordinates
are mapped to camera coordinates
as follows:
Here,
is the depth of the point, and
is the inverse of the intrinsic matrix. This step translates the 2D pixel coordinates into the camera’s 3D space by introducing the depth information, obtained as a part of the state estimate relative position.
5.2.4. Transformation to World Coordinates
Once the 3D coordinates in the camera space are computed, the inverse of the extrinsic matrix
is used to convert these into world coordinates:
By applying these transformations, the detected 2D pixel coordinates from the CNN module are converted into 3D world coordinates, enabling accurate state estimation of the satellite’s relative position and orientation by providing measurements.
Figure 8 visually summarizes the transformations between coordinate frames.
5.3. Measurement Equations
The state vector has 12 components that can be divided into 4 parts. Each part, composed of three elements, describes one aspect of the pose of the target satellite, ENVISAT.
Note that the full simulation works with 18 states for the propagation of absolute states as well, i.e., the chaser’s orbit in terms of position and velocity with respect to the ECI frame. The four key components to be tracked are the relative position between the centers of mass of the target and chaser, the relative velocity of the center of mass, the Modified Rodrigues Parameters (MRPs), and the angular velocities. Therefore, it is necessary to evaluate the marker positions based on the predicted state to compare the predicted measurements with the actual measurements obtained from the camera system during the update step of the filter. Let
represent the position vector of the chaser’s center of mass relative to ENVISAT’s center of mass. The measurements for each marker are calculated individually as follows: the marker position vector
is initially expressed in the target’s reference frame, known from the available CAD models of ENVISAT. This vector is then transformed into the chaser’s reference frame by multiplying it with the rotation matrix
. Finally, the position measurement of each marker relative to the chaser is determined through a simple vector difference:
Here,
represents the position measurement of a marker relative to the chaser’s center of mass, and the rotation matrix
is derived from the MRP, meaning that the measurement models couples translation and rotational states.
Figure 9 demonstrates how the vectors are arranged in the measurement equations [
37].
5.4. Measurement Noise Covariance Calculation
A statistical estimation procedure was developed based on repeated TinyCornerNET detections over a single image to obtain a physically accurate measurement noise covariance matrix
that reflects real-world sensor uncertainty. We empirically derive
from repeated runs of TinyCornerNET under simulated variability, resulting in data-driven characterization of the uncertainty of the 3D measurement rather than giving it arbitrary or constant values. The procedure starts with a chosen image projected from the satellite simulation along with the corresponding ground truth corner locations and depths. TinyCornerNET utilizes randomized sensor perturbation in multiple iterations to detect 2D corner positions. These perturbations, which mimic real-world phenomena like camera shake, sensor noise, and image deterioration, include affine jitter, random noise injection, and controlled blur. The positions of each set of identified 2D pixels are transformed into 3D coordinates. For each marker
, this process yields a distribution of 3D measurements
, where
N is the number of Monte Carlo runs. The sample covariance of these measurements is then computed as follows:
where
is the mean of the 3D positions of marker
i. To capture a global characterization of the sensor noise model, all residuals across all markers are stacked, and a global covariance matrix
is estimated:
The resulting covariance matrix is integrated into the UKF initialization, ensuring the filter’s confidence in measurements.
Figure 10 visualizes the CNN-detected marker spread around the ground truth for a sample image, where ellipses represent
contours of the 2D covariance. Based on the TinyCornerNET detection statistics, the measurement noise covariance matrix for a single corner in 3D space was estimated through repeated randomized simulations. The resulting covariance captures the expected uncertainties, incorporating the effects of sensor noise, blur, and affine perturbations. The final estimated
matrix for a single corner is given as follows:
As shown, the off-diagonal elements reveal a correlation between the
x and
z components, while the
y-axis (depth) remains uncorrelated. This structure reflects the physical imaging characteristics where depth uncertainty arises independently from lateral uncertainties. Since each satellite corner marker is assumed to be independent, the full measurement noise matrix
for all markers is constructed by stacking the individual
blocks along the diagonal, forming a
covariance matrix for the measurement vector when all visible markers are detected.
5.5. Markers Visibility
The presented measurement model relies on the positions of the eight corners of ENVISAT’s main body. However, the camera cannot capture all marker positions in a single frame because some markers are obscured by the ENVISAT’s structure itself. As a result, the filter cannot use the entire set of markers simultaneously; it has to adjust its measurements based on the visible markers. Consequently, the size of the measurement vector varies depending on the number of visible markers. Each marker provides three position measurements as an observation, so the measurement vector
will have
components, where
. The visible markers are located on the visible faces of the satellite, which are easier for the CNN to detect. Thus, the detection process focuses on identifying these visible faces for marker extraction. The requirement for face visibility is determined by the following equation:
where
defines the
face of the main body. If the scalar product between the relative position vector of the chaser and the target
and the unit vector perpendicular to the face
is negative, it indicates that the face is oriented towards the camera, making the markers on that face visible. The filter uses the state information at the beginning of each observation to predict which markers will be visible in the next step, preparing to receive the correct number of measurements from the camera. The filter hardcodes the mapping between ENVISAT’s faces and the relative markers according to the vectors
, thereby predicting marker visibility. The filter handles marker visibility in a binary manner, assigning a value of one if a marker is visible and zero if it is not, so that it can adapt the measurement dimensionality.
5.6. Marker Association
Once the CNN detects the potential marker positions, these measurements are compared to the predicted locations obtained busing Equation (
44) for each marker through the filter’s estimated states. The association process is refined by minimizing the total distance between the predicted and measured positions. The total minimum distance is calculated using the Euclidean distance to ensure that the predicted marker locations are as close as possible to the measured ones. Mathematically, this operation can be shown by constructing a cost function,
J, that minimizes the squared distance between the predicted marker locations,
, and the measured marker locations,
. The cost function
J selects the pairings
that minimize the total squared distance, effectively associating each predicted marker to the closest measured marker. This can be formulated as follows:
Figure 11 visually demonstrates the marker association process that uses the previous equation to match the specific markers with the corners.
Dashed yellow lines represent the distance to the second-closest corner, while the orange solid line shows the shortest path to a specific corner. The marker association optimization ensures that the markers identified by the CNN are associated with the correct position vector .
6. Numerical Results
Before discussing the results of this work, it would be appropriate to give an end-to-end pipeline for summarizing all components until this point. Therefore,
Figure 12 provides a pipeline diagram of the method discussed in this work. Moreover, as shown in
Figure 9, ENVISAT’s center of mass is separated from the geometric center, located closer to one of the solar panel sides. This asymmetry in the density creates robustness in the marker association for the initial and transient parts of the filtering simulation.
Note that in the estimation framework, initial uncertainties are accounted for by introducing random noise into all states. Specifically, a standard normal distribution is applied to these states, with varying levels of uncertainty to reflect their initial conditions, which are given in
Table 2. For the position state, a normal distribution with a standard deviation of 1.0 is used, indicating relatively high uncertainty in the initial position estimation. The velocity state is subjected to a tighter distribution with a standard deviation of
. Similarly, the MRPs, representing the satellite’s attitude, are modeled with a normal distribution with a standard deviation of
, indicating moderately high uncertainty in the initial orientation. Lastly, the angular velocity state is assigned a standard deviation of
. The parameters for the UKF are selected as follows:
,
, and
. These values are chosen based on standard guidelines for UKF performance and system dynamics. All other simulation-related parameters, such as the inertia tensor and mass of ENVISAT, are taken from the literature [
50].
Our initial findings demonstrate promising outcomes in the detection and analysis of structural markers on the ENVISAT satellite through processed imagery. This approach not only enhances the robustness of our detection algorithms under varied operational scenarios but also aligns with the typical image quality captured by space-borne sensors. Each simulated image undergoes a preprocessing phase where Gaussian noise and blur are applied to prove the robustness and performance of the software. The addition of Gaussian noise is intended to mimic the electronic noise and sensor imperfections typically found in spacecraft imaging systems. This noise simulates the random variations in pixel intensity that occur due to various factors. Simultaneously, a Gaussian blur was applied to replicate the blurring effect caused by minute focusing discrepancies or motion effects that can occur in a spacecraft’s optical system [
12].
Table 3 shows the camera and simulation parameters selected within this study.
Figure 13 demonstrates the preliminary results of a CNN algorithm. After marker identification, the measurements coming from the CNN algorithm are fed into the UKF framework as the measurements of the filter for the relative pose estimation. As a reminder, it is assumed that a priori knowledge of the geometrical and physical characteristics of both chaser and target is available, with rigid body dynamics, and that the chaser motion is deterministic and without external disturbances or control actions. The camera cannot detect all marker positions in a single frame because parts of ENVISAT’s structure obscure some markers. Consequently, the filter does not operate with the entire set of markers but adjusts its measurements frame-by-frame based on which markers are visible. The visibility and correct association of a corner to its corresponding marker are crucial, as locating more markers tends to enhance the estimation accuracy since it increases the observability of the system. Moreover, the presence of Gaussian noise and blur in the measurements further exacerbates the degradation in estimation quality. These factors introduce additional variability and error into the measurements, which in turn affect the overall accuracy of the position and orientation estimates.
In order to understand the effects of measurement noise and the number of visible markers, Monte Carlo runs were executed within the UKF framework. Note that in all
Figure 14,
Figure 15 and
Figure 16, the red dashed lines represent the three sigma values from the updated error covariance matrix of the filter, the solid blue lines are the three sigma values obtained from the sampled errors of the Monte Carlo, and the gray lines are single Monte Carlo runs of the simulation. The effective (sampled) errors are computed as the root-sum-square (RSS) differences between the true states and the estimated mean states. Note that the effective error of the Monte Carlo simulation represents the actual error spread of the filter, calculated by running multiple simulations with different initial conditions. The estimated errors, derived from the covariance matrix, are computed as the square root of the sum of the diagonal elements corresponding to the filter states. The difference can be obviously seen in
Figure 14 and
Figure 15, where the standard deviation of the measurement noise (zero-mean Gaussian) for the sensors is increased from
to
. These values are selected to simulate mild to moderate noise on digital, simulated images. This dual impact of fewer markers and increased noise highlights the challenges in accurately determining the satellite’s state under suboptimal conditions. These results underscore the importance of maximizing the number of observed markers and minimizing noise to achieve high-quality estimation of the satellite’s relative position and orientation. Finally, in
Figure 16, it is evident that with fewer markers observed throughout the time period, the estimation accuracy is also decreased.
Again, in
Figure 14,
Figure 15 and
Figure 16, the red dashed lines represent the three-sigma values derived from the estimated error covariance matrix, and the blue lines show that the effective three-sigma values obtained from the sampled errors are close to each other and encapsulating the Monte Carlo errors. Consequently, the UKF has performed well, maintaining estimation accuracy despite the challenges posed by reduced marker observations and other uncertainties in the simulation, such as the measurement noise on the sensors. The filter is able to predict its own uncertainties, and the initial large error covariance is reduced rapidly to steady-state values. Indeed, to show convergence, the figures are zoomed around the steady state, while the initial large uncertainties are out of the graph (for visualization purposes). The error spikes are visible around some points, especially for the rotational position and velocity errors. The fundamental reason behind this phenomenon is the positioning of ENVISAT in the camera frame and its orientation, which reduces observability. It is important to state that the filter robustness and accuracy are sensitive to the filter frequency, which is set at 1 Hz for this study. The filter tends to perform less accurately below this frequency threshold, while it stays convergent above this frequency due to the sampling rate and discretization errors.
Robustness to Missing Observations: The Eclipse
For the ENVISAT satellite, the eclipse duration was calculated to be 35.57 min. This eclipse occurs when the satellite passes into the Earth’s shadow, during which the Sun’s light is blocked, and no direct sunlight reaches the satellite. Regardless of possible detection due to other light sources, it has been assumed to be a worst-case scenario where, during this eclipse period, the corners and markers of the satellite will never be visible to the cameras used for measurement. As a result, there will be no measurement updates in the UKF during this time. The UKF relies on visual tracking to update the satellite’s state estimates, and without sufficient visibility, the necessary data for these updates will be unavailable. Therefore, during the eclipse period, the estimation process will continue without measurement updates, relying solely on the prediction step to propagate the relative pose uncertainties.
Figure 17 visualizes the eclipse phase where the Sun is inserted far away in the
axis.
In order to mitigate the impact of missing measurements and maintain estimation quality, we introduce and adjust the process noise covariance matrix
during the eclipse, which starts around the 2000th s of the simulation and continues for approximately 35.57 min. While measurements are unavailable, the fidelity of the uncertainty estimate is merely based on the propagation step. In an ideal scenario, the predicted PDF, approximated as Gaussian by the UKF, perfectly describes the true non-Gaussian PDF, whose shape is unknown. In order to not lose custody of the target with an extreme confident success rate, the UKF predicted covariance needs to guarantee that the full uncertainties of the true PDF are enclosed in its prediction. This aspect is achieved through covariance inflation to ensure that the algorithm counteracts distributions with long tails and many outliers, which usually results from highly nonlinear dynamics (such as ENVISAT’s). The matrix
reflects uncertainties in the system dynamics, and increasing its values during the eclipse according to the covariance-inflating methodology helps to ensure that the predicted uncertainty adequately encompasses the potential spread of the state. This inflation prevents the filter from becoming overconfident in its predictions and helps retain robustness under high nonlinearity and absent measurements. As the system is highly nonlinear, and the distribution becomes non-Gaussian, the noise inflation prevents degeneracy from outlier observations or states, providing robustness from every possible revelation. As shown in the results, the covariance inflation guarantees that every possible Monte Carlo simulation, not just in a
boundary sense, is enclosed in the uncertainty to guarantee that the UKF does not lose fidelity and custody of the target. Once the satellite exits the Earth’s shadow and the eclipse ends, measurement updates resume, and the UKF rapidly reduces accumulated prediction error by incorporating sensor data. To assess the filter’s consistency under these conditions, Monte Carlo simulations were conducted.
Figure 18 compares estimated covariances for eclipse and no-eclipse cases, where solid blue lines denote three-sigma bounds from the estimated covariance under eclipse, red dashed lines represent the no-eclipse case, and gray lines show the Monte Carlo errors. In
Figure 19,
Figure 20,
Figure 21 and
Figure 22, red dashed lines show three-sigma values from the sampled errors, solid blue lines represent the three-sigma values from the estimated covariance, black lines show the mean error, and gray lines denote the Monte Carlo realizations.
The results demonstrate that the filter remains consistent and accurate in tracking the satellite’s state under nominal conditions. Notably, during the eclipse period, when no measurements are available because of the occlusion of satellite markers, an adjustment of the process noise covariance matrix is implemented. Despite the lack of direct measurement updates during the eclipse, the Monte Carlo errors remain within the expected bounds, indicating that the
adaptation preserves the filter’s accuracy and robustness throughout the eclipse period. The matrix
is a diagonal matrix in which each diagonal element represents the process noise for a specific state of the system. The first three values
correspond to position states and have negligible process noise. The next three values
are for the translational velocity states, reflecting low uncertainty. The following three values
are for the rotational states, represented by the MRPs, indicating higher uncertainty. The last three values
correspond to the rotational velocity states, with moderate uncertainty. This diagonal structure allows for independent control of the noise levels for each state.
Table 4 shows the diagonal values of
.
Beyond the eclipse situation, we also ran a Monte Carlo analysis for a case without any eclipse. This helped us establish a consistent point of comparison. It is available on
Appendix A section of this study.
7. Discussion
This paper presents a robust approach to relative pose navigation for the tumbling ENVISAT spacecraft, emphasizing the online collaboration between a filter and a CNN-based image processing camera. As part of this study, we introduce TinyCornerNet, a lightweight CNN specifically designed for robust corner detection in spaceborne imagery. Unlike large-scale models, TinyCornerNet achieves high accuracy while maintaining computational efficiency, making it well-suited for onboard or real-time processing. Its architecture was optimized to reliably detect structural markers on satellite targets under various noise and distortion conditions. The results demonstrated that TinyCornerNet outperforms traditional corner detectors in both accuracy and robustness. In each cycle, the CNN processes the incoming images to extract corner measurements, which the filter then uses to predict and update the uncertainty of the system. Overall, the proposed pipeline not only addresses the complexity of autonomous rendezvous with a non-cooperative target but also lays a foundation for future advancements in space debris monitoring and active removal. Additionally, a key contribution of this work is the adjustment of the process noise covariance matrix via covariance inflation during eclipse periods. This adaptation ensures the continued accuracy of the UKF, allowing it to maintain state estimation despite the absence of direct sensor updates. Monte Carlo simulations, with and without eclipse scenarios, demonstrate the robustness and consistency of the proposed method. In eclipse conditions, the adjustment effectively compensates for the uncertainty introduced by the lack of measurements, keeping estimation errors within expected bounds. The reason is that represents the uncertainty in the system’s dynamics, specifically the degree to which the model’s predictions may diverge from the true state due to unmodeled/unknown or highly nonlinear dynamics, disturbances, or inaccuracies in the dynamic model. By increasing the values in during the eclipse, the filter acknowledges the higher spread of uncertainty and adjusts its predictions accordingly. A higher value increases the predicted state covariance, allowing the filter to reflect the potential range of error in its predictions more accurately. This prevents the filter from becoming overly confident in its predictions, thus mitigating the risk of large deviations between the predicted and true states. Once measurements are available again, the filter can quickly recover, reducing the accumulated uncertainty and maintaining overall consistency. The scenario without an eclipse confirms the filter’s nominal performance with continuous measurements. These findings suggest that the proposed approach offers a scalable, reliable solution for space debris management and contributes significantly to the advancement of autonomous de-orbiting systems.
In future work, we plan to integrate the CNN module directly within the UKF framework to further refine the pose estimation and tracking of space objects. This integration aims to overcome the limitations of current tracking systems by providing more accurate, rapid, and reliable state estimation under the complex and dynamic conditions of space environments. Moreover, the UKF will be improved by trying to implement an online adaptation of the measurement noise matrix rather than putting it into a constant matrix, which is reported to improve the quality of the estimation in different studies [
51,
52,
53]. Future work should also focus on incorporating stochastic absolute values into the estimation process and enhancing the interaction between the camera system and the filter. Introducing stochastic elements can account for uncertainties and variabilities in absolute measurements, thus improving the robustness and accuracy of state estimation. Future work will involve using the full ENVISAT model with all its detailed features (solar panels, body panels, etc.). We will test the corner detection performance under these more realistic conditions to better evaluate the robustness of our method. Additionally, optimizing the interaction between the camera system and the filter can lead to more efficient processing, allowing the filter to guide the camera in focusing on specific regions of interest, thus reducing computational load and improving real-time performance.