Implicit Extended Kalman Filter for Optical Terrain Relative Navigation Using Delayed Measurements

: The exploration of celestial bodies such as the Moon, Mars, or even smaller ones such as comets and asteroids, is the next frontier of space exploration. One of the most interesting and attractive purposes from the scientiﬁc point of view in this ﬁeld, is the capability for a spacecraft to land on such bodies. Monocular cameras are widely adopted to perform this task due to their low cost and system complexity. Nevertheless, image-based algorithms for motion estimation range across different scales of complexities and computational loads. In this paper, a method to perform relative (or local) terrain navigation using frame-to-frame features correspondences and altimeter measurements is presented. The proposed image-based approach relies on the implementation of the implicit extended Kalman ﬁlter, which works using nonlinear dynamic models and corrections from measurements that are implicit functions of the state variables. In particular, here, the epipolar constraint, which is a geometric relationship between the feature point position vectors and the camera translation vector, is employed as the implicit measurement fused with altimeter updates. In realistic applications, the image processing routines require a certain amount of time to be executed. For this reason, the presented navigation system entails a fast cycle using altimeter measurements and a slow cycle with image-based updates. Moreover, the intrinsic delay of the feature matching execution is taken into account using a modiﬁed extrapolation method.


Introduction and Background
Vision-based navigation is becoming the most prominent solution for autonomous navigation for different scenarios [1][2][3][4][5]. Moreover, activities linked to the Lunar Gateway and Lunar villages have created a deep interest in the lunar environment [6][7][8]. Numerous future missions will entail landing operations to deploy technological systems on the surface of the Moon. The driving precision landing requirement for the autonomous landing is to land within ∼100 m of a predetermined location on the lunar surface, or any other planetary surface [9]. Traditional lunar landing approaches based on inertial sensing do not have the navigational precision to meet this requirement. The purpose of terrain navigation is to augment inertial navigation by providing position or bearing measurements relative to known surface landmarks, if available, or local motion estimate in unseen environments. From these measurements, the navigational precision can be reduced to a level that meets the ∼100 requirement. According to [9], there are mainly two different navigation functions: global position estimation, here referred as absolute navigation and local position estimation, here referred as relative navigation. These functions can be achieved with active range sensing or passive imaging. Among all the different alternatives presented in the comprehensive review paper in [9], we can point out the following highlevel clusters of approaches depending on the availability of a landmark/feature database: • Available database: we refer to a database when a set of known landmarks or features of the viewed terrain is cataloged, reporting their absolute location with respect to a planetocentric reference frame, e.g., ME lunar frame. The easiest example is the crater matching approach, in which known lunar craters are cataloged with their latitude and longitude with respect to a Lunar fixed frame. The detection and matching of one of the cataloged landmarks (or computer-based database features) yield a bearing measurement that can be processed to establish the global position of the spacecraft. This approach is robust and accurate but it requires at least a partial knowledge of the terrain before the spacecraft sees it. • Unavailable database: whenever the spacecraft is flying across unknown terrains, it is impossible to establish its global position from images only. Nevertheless, local (or relative) position, is still possible using frame-to-frame methods, which generally falls into the visual odometry domain in terrestrial robotics. Methods that fall within this classification differ between each other for increasing complexity level, along with increasing computational effort. In particular, the retrieval of a three dimensional description of the scene, e.g., the creation of a map in structure-from-motion-like methods, represent one of the major trade-off to be performed when assessing the feasibility of on-board implementation. These methods solve the motion estimation task by generating a 3D sparse map. A set of 2D-to-2D correspondences is obtained, relative pose between the frames is calculated and a sparse map of 3D points is initialized exploiting triangulation. The 2D features are then tracked for each subsequent frame and correlated to the 3D map: this way a set of 3D to 2D correspondences is obtained and used to solve the perspective-n-point problem, which along with a RANSAC routine set to delete incoming outliers (wrong match between target image and map), gives as a result a first estimate of the relative position of the camera.
In this paper, relative terrain navigation relies on frame-to-frame (2D-2D) motion estimation that works with extracted features from a pair of images, without creating any maps of the surrounding environment, coupled with an altimeter measurement. The reader is suggested to refer to [10] to deepen the knowledge on multi-view geometry and motion reconstruction. Once the features are tracked, a set of correspondences is available between the two frames. Feature correspondences are related by epipolar geometry [10], which completely describes the structure of the two consecutive camera poses and of the world points seen by them. This description is enclosed inside the essential matrix E for calibrated cameras, while fundamental matrix F holds for uncalibrated ones. Once one of these two matrices is available, motion can be retrieved up to scale by a simple algebraic decomposition. For particular cases in which the viewed scene is planar, feature correspondences may be also related by an homography matrix H, from which motion can be again extrapolated up to scale. Among the frame-to-frame alternatives that can be employed for on-board applications in relatively low-power devices [11], two alternatives represent the state-of-the-art in the literature.

Normalized 8-Point Algorithm
The normalized 8-point algorithm developed by Richard Hartley [10] is one of the simplest and widely diffused method to obtain a fundamental matrix F from a set of 8 features correspondences without knowing the intrinsic parameters of the camera. The algorithm simply involves the construction and least square solution of a set of linear equations, and uses normalized input data for better conditioning of the problem and stability of the result. The suggested normalization is a translation and scaling of each image so that the centroid of the reference points is at the origin of the coordinates and the RMS distance of the points from the origin is equal to √ 2. It is known [10] that for the fundamental matrix F the following constraint is always valid: This gives rise to a set of linear equations of the form A f = 0. If A has rank 8, then it is possible to solve for f up to scale, and this is the classical way of implementing the 8-point algorithm. In the case where the matrix A has rank seven, it is still possible to solve for the fundamental matrix by making use of the singularity constraint. The matrix F found by solving the set of linear equations will not in general have rank 2 (i.e., will not be exactly singular due to presence of noise in the extracted features x k−1 , x k ), consequently, steps to enforce this constraint are taken. The most convenient way to do this is to correct the matrix F found by the SVD solution from A. Matrix F is replaced by the matrix F that minimizes the Frobenius norm subject to the condition det(F ) = 0. In the normalized 8-point algorithm singularity constraint is enforced solving directly for F using SVD, which is simple and rapid. Exact matrix F is then finally retrieved denormalizing matrix F obtained from the normalized data. This method is direct and efficient to compute, but has anyway a major drawback, which is associated to F matrix itself. Independently from the method used in fact, fundamental matrix suffers from the so called "planar structure degeneracy". If the observed points from the camera lies on a plane (as it could be the case during a landing phase, where surface is seen from far away and appears to be flat) F is determined only up to three degree of freedom, which leads to a three-parameter family of possible fundamental matrices F (one of the parameters accounts for scaling the matrix so there is only a two-parameter family of homogeneous matrices). This ambiguity is not solvable and is a consequence of the fact that the camera intrinsic (K matrix) are not included in F.

5-Point Algorithm
Given a set of 2D feature correspondences, the most efficient solution to estimate the essential matrix E is represented by the five-point algorithm [10,12]. The problem is to find the possible solutions for relative camera pose between two calibrated views given five corresponding points. The algorithm consists of computing the coefficients of a tenth degree polynomial in closed form and subsequently finding its roots. Only relative positions of the points and cameras can be recovered, overall scale of the configuration represents an ambiguity and can never be extrapolated solely from images. Image points are represented by homogeneous 3-vectors in the first and second view, respectively. World points are represented as homogeneous 4-vectors X. This algorithm is thought to be used in conjunction with pre-emptive RANSAC in order to be more robust to the presence of outliers (i.e., false matches, wrong tracking) between the features. A number of random samples are taken, each containing five points. Five-point algorithm is applied to each sample and thus a number of hypotheses are generated. The best hypothesis is then chosen according to a robust measure over all the tracks and is in the end iteratively refined. One of the most important features of the five-point algorithm is that it works for any kind of configuration of the features considered, avoiding the planar structure degeneracy. This algorithm results to be efficient both in terms of accuracy and speed. In comparison to other state-of-the-art methods, despite being weaker for determining rotations, fivepoint works optimally for sideways motion and similarly for forward motion, although slightly worse when baseline between the two cameras is very small. Using the essential matrix also removes the projective ambiguity, which arises using fundamental matrix F, and provides a metric (or singular) reconstruction, which means the 3D points are true up to scaling alone, and not up to a projective transformation. Ideally, the output of the correspondent locations m k−1 , m k and the essential matrix is enough for a navigation filter to perform state estimation. As the number of feature grows, it may be convenient to pre-process such information to recover relative rotation and translation directly in the image processing block.
In principle the above algorithms delivers a relative pose of consecutive views of the camera, in the form of essential matrix for instance. In this way, direct relative translational vector would be input to the filter.
In this paper, the implicit extended Kalman filter (IEKF) receives a set of normalized Euclidean correspondences and the essential matrix. This vision-based approach is realized via an implementation of the IEKF, which is a variation of the classical Kalman filter that allows incorporating implicit functions as measurement constraints of the state variables. The epipolar constraint, which is a geometric relationship between the feature point position vectors and the camera translation vector, is employed as the implicit measurement in the Kalman filter. Basically, the algorithm is based on the epipolar constraint, which is manipulated by matrix decomposition to obtain relative translation and rotation. The core of this paper is to develop an algorithm that can be deployed on-board. The problem of deterministic delay due to the image processing time is taken into account using a customized extrapolation method that can fuse the high-frequency altimeter measurements with the low-frequency optical information, with an intrinsic delay of 1 s (Figure 1). Figure 1. Schematics of the required pre-processing steps to interface with the navigation filter. The figure shows the different possibilities for interfacing the feature-based image processing with the navigation filter with unknown landmarks and no 3D-map generation. In particular, the main difference is whether the essential matrix is reconstructed externally (e.g., 5-point algorithm) and used to retrieve the relative pose or it is embedded in the filter as the implicit measurement function.

Navigation Algorithm Description
The relative navigation reconstructs the relative state with respect to the defined ground reference system (GRS). The idea is to detect salient features, which may not be part of a database, and track them in subsequent frames as outlined in Figure 2.  Figure 1 showing the adopted baseline for this work in terms of IP-navigation interfaces.

Feature Detection & Matching
The pinhole camera model is adopted in this paper. Given the location of a detected feature ρ f ,B in the camera frame, which is equal to the body frame B b 1 ,b 2 ,b 3 | k for simplicity: the pinhole camera model expresses the location of the feature point in the focal plane as: where: The ORB-descriptors are employed to make the matching more robust. ORB descriptors assign an orientation to each feature such as left or right facing depending on how the levels of intensity change around that feature. For detecting intensity change, ORB uses intensity centroid. Given the assumption that a corner's intensity is offset from its center, the intensity centroid may be used to impute an orientation. To prevent the descriptor from being sensitive to high-frequency noise, BRIEF method smooths image using a Gaussian kernel. Subsequently, a pair of pixels is selected in a defined neighborhood around that feature. The defined neighborhood around pixel is known as a patch, which is a square of some pixel width and height.

Relative Navigation Filter
Algorithm 1 reports the most relevant steps of the EKF procedure in a pseudo-code format. The extended Kalman filter in Algorithm 1 represents the basis for the implementation of the algorithm; nevertheless, strong modifications have been made in the update step and the asynchronous measurement integration, described in Sections 4.2 and 4.4.

Algorithm 1 Extended Kalman Filter
State and measurement Jacobian matrices

Prediction Step
The prediction step of the implicit Kalman filter follows the classical formulation of Algorithm 1. It is important to remark that the distinction between absolute and relative navigation is the reference frame in which the state is reconstructed. In the absolute navigation, the state of the spacecraft is reconstructed with respect to the lunar fixed frame. The lunar fixed frame considered is the mean Earth/polar axis (ME) reference system, as shown in Figure 3, described in details in [13,14]. It defines the z-axis as the mean rotational pole. The prime meridian (0 degrees Longitude) is defined by the mean Earth direction. The intersection of the lunar equator and prime meridian occurs at what can be called the Moon's "mean sub-Earth point". The concept of a lunar "sub-Earth point" derives from the fact that the Moon's rotation is tidally locked to the Earth. The actual sub-Earth point on the Moon varies slightly due to orbital eccentricity, inclination, and other factors. So, a "mean sub-Earth point" is used to define the point on the lunar surface where Longitude equals 0 degrees. This point does not coincide with any prominent crater or other lunar surface feature. During these phases, orbital mechanics equations govern the motion. In case of relative navigation, the ground reference system (GRS) is used, depicted in Figure 4. In the case of a close approach during landing, distances, for both downrange and altitude, are small compared to the planet's radius; thus, the assumption of a constant gravity field with flat ground is appropriate. This assumption is widely used and accepted in the literature (see [16]). The translational dynamics of the spacecraft are expressed in a ground reference system (GRS), where z-axis is the altitude, x-axis downrange in the direction of flight and the consequent y-axis cross-range. The dynamics during the power descent phase is described by the equations: where T is the thrust vector delivered by the on-board propulsion, whose specific impulse is I sp . In synthesis, the filter state can be written as: The dynamical model is linear, hence the linearization is not required to compute the covariance propagation as: where the state transition matrix can easily be written as: This paper does not investigate thoroughly the influence of the dynamical model on the performance of the relative terrain estimation. The novelty lies on the processing of the frame-to-frame matched features. Regarding the dynamics to be employed, more accurate environmental representation can be used, such as the one described in reference [1]. A more accurate dynamical model would certainly be beneficial for the filter accuracy, nevertheless literature mostly reports applications in which constant gravity field is used for planetary landing [17]. The introduction of the intrinsic formulation coming from the epipolar constraint allows the filter to be directly fed with the frame-to-frame matching without the need for an additional pre-processing step to retrieve relative translation from sequential images (e.g., using the 5-points algorithm and essential matrix decomposition). The increase in time for propagating the a priori state with a more complex dynamics is negligible with respect to the above-mentioned pre-processing step.

Implicit Measurement Function: The Epipolar Constraint
The core of this paper is the development of an estimation algorithm that incorporates matched features directly, without augmenting the internal states of the filter. The state of the filter x is the estimate of the relative position r and velocity v vectors. The filter receives two elements for each matched features as measurement. Indeed, the locations of the n f matched features in the two successive camera frames (k − 1 and k) are stored in stacked variables, m k−1 and m k , whose columns: The frame-to-frame matched features represent the measurements at each time step. Nevertheless, it is not trivial to obtain a measurement function that links the filter states r and v to the matched features between two consecutive frames k − 1 and k. In principle, one would need an explicit function in the form of m k−1 ( x k−1 ), m k ( x k ). Each of these features are related by the epipolar constraint given by Equation (8). where in which R is the rotation matrix between the camera poses in two successive frames, namely from step k − 1 to k and t is the translation between the camera origin between two successive frames, expressed in the camera frame at step k, which is equal to the body frame B b 1 ,b 2 ,b 3 | k for simplicity. Typical space applications require the attitude to be reconstructed with respect to one inertial frame or, at least, a planetocentric reference frame. As mentioned, this paper investigates only relative position and velocity estimation from optical measurement; thus, it is assumed that an internal knowledge of the system attitude with respect to an inertial frame I i,j,k is known. The reference frames, relevant for visual odometry, are described in Figure 5. With reference to Figure 5, one can write the epipolar constraint as: where it is important to note that the translation between two camera frames is expressed in the body frame B b 1 ,b 2 ,b 3 | k , at time instant k. With the above definitions, following the development of [18,19], we can rearrange the epipolar constraint in Equation (8), recalling that we have n f epipolar constraints, each one following Equation (8). Thus, rearranging: where C k−1,k ∈ R n f ×9 is a matrix, whose rows correspond to the following vector referred to a given matched feature: and e = [e 11 e 21 e 31 e 12 e 22 e 32 e 13 e 23 e 33 ] is a vector of stacked columns of the epipolar matrix E. From the definition of the feature matrix C and the epipolar matrix E, it is straightforward to note the variable dependencies of such matrices. In other words, one can write: where q is the quaternion state, which is typically the attitude representation available on-board. By recalling that the filter stateˆ x = [ˆ r,ˆ v] we can explicitly write the implicit measurement as a function of the filter state. Using the above derivation, we can finally express the measurement that will be processed in the implicit Kalman filter: It is important to note that the true value of Equation (16) is always y k = 0 because it is a geometric constraint that is always satisfied. Using the estimated state to compute the essential matrix yields an innovation term that is defined as the difference between the true value y k = 0 and the implicit measurementˆ y k .

Correction Step
The implicit measurement derived in Section 4.2 is used in the correction step of the relative terrain navigation filter. In particular, with reference to Algorithm 1, the correction steps entails the implicit measurementˆ y k calculated using the current state estimateˆ x k and the set of matched features m k−1 and m k : The measurement update in the correction step of Algorithm 1 can be written as: The Kalman gain in Equation (20) is calculated through the standard formula using the measurement covariance matrix first order approximationR k . This is needed due to the implicit form of the measurement function, which derives from the noise covariance matrix of the measured feature localization R k . In particular the Kalman gain can be written: where the first-order approximation of the measurement covariance matrix is derived: in which the diagonal matrix: where the vector˜ Y k = [u 1,k−1 v 1,k−1 u 1,k v 1,k . . . u n f ,k−1 v n f ,k−1 u n f ,k v n f ,k ] combines the locations of all the n f matched features at two consecutive steps. In order to complete the calculation of the Kalman gain in Equation (21), the Jacobian matrix of the measurement function h needs to be computed as: The core correction step is represented by the optical measurements described so far. Nevertheless, landing navigation often uses high frequency altimeter measurements fused in the navigation filter. As mentioned, the altimeter frequency is eight times the optical one, namely f alt = 8 Hz. This means that the filter updates every t alt = 125 ms with altimeter measurements only and every t opt = 1 s with a major correction steps including the optical images. For the selected filter state, the measurement matrix for the altimeter measurements is fairly simple, because the vertical distance with respect to the terrain is equal to the z component of the position: that represents the measurement vector to be used in the Kalman update. Such vector is used alone during the fast cycles, and appended to the matrix H k of Equation (24) when the major slow cycle occurs. The issue with asynchronous updates due to the image processing computational time is explored in the next Section 4.4.

Delayed Measurements Integration
The navigation algorithm heavily relies on optical measurements. The information content is extracted from the images through the feature extraction and matching algorithms and the intermediate post-processing. Such process takes a finite amount of time that needs to be taken into account when fusing the measurements in the IEKF, especially for real-time applications. Indeed, when delayed measurements are present, at instant k the system receives a delayed measurement corresponding to time instant s (s = k − N, where N number of delay samples). There are various methods to consider the measurements delays in the navigation filter [1]: • Filter recalculation method: it consists of coupling two filters running at fast and slow rate [20]. The former incorporates the high-frequency measurements, whereas the latter is activated every time a delayed (e.g., slow and less frequent) measurement arrives. The method computes the entire trajectory of the state until the current step. Using this method, optimality is guaranteed at the cost of computational burden. • Alexander method: it consists of updating the covariance matrices at time s as if the delayed measurement arrived. Then, once measurements Y s are inserted at time k, the update is simply the standard Kalman filter one with a correction matrix term [21]. • Larsen extrapolation method: The method described in [21] requires the measurement matrix H s and the noise distribution matrix R s at time s. In the presented scenario, this is not valid: the measurement matrix depends on the relative positioning of the camera and craters. Larsen developed a measurement extrapolation method that does not require knowledge about the two matrices until time k [22]. This method is taken as a reference to implement a modified version suitable for the analyzed scenario.
The adaptation of the Larsen method for the measurement fusion is hereby described. For details on the derivation, the reader is suggested to refer to the original reference [22]. Several modifications were needed to solve two shortcomings of the original method: the incorporation of high-frequency altimeter and the extension to the nonlinear extended Kalman filter. For the former, the filter firstly computes the gain and the updates as in Algorithm 1 fusing fast altimeter measurements. For what concerns the delayed measurements, let us call the measurements coming from the time instant s = k − N as y s , which are incorporated at time instant k. The Larsen method consists of calculating an extrapolated measurements from y s to be integrated at time k, called y ext k,s : where y s = 0 because it represents a geometric epipolar constraint at any time. At each intermediate step between s and k a correction term M is calculated as: where the Kalman gain and measurement sensitivity matrix H k−i at step k − i does not reflect any update coming from the delayed measurement Y s . Then, the updates of the correction term are calculated as follows, modifying the correction equations in Algorithm 1: The covariance update is a modified version of the Joseph formula adapted to the original Larsen method. This is performed to ensure that the covariance matrix remains positive semi-definite. As seen in Equations (26) and (27), the extrapolation method always requires only two matrix multiplications at each time instant and the storage of two variables any time an image is acquired.

Numerical Results
The following section provides preliminary insights on the algorithm performance during a sample landing trajectory.

Scenario Description
The considered mission scenario consists of the spacecraft descent on the lunar South Pole from an altitude of 100 km down to 3 km. The landing area, within the South Pole, is defined during the mission. A 2D planar Moon landing is taken as reference, nevertheless the approach is easily applicable to a 3D scenario (including cross-range direction). Spacecraft trajectories are generated executing optimal guidance algorithm depending on the target location and thrust constraints. Moreover, due to the given landing location, it is critical to take into account illumination and shadowing condition. In particular, previous studies have highlighted that for the landing site selection it is fundamental to consider the South Pole regions that present areas in sunlight. Since the angle between the Moon rotation axis and the ecliptic is about 90 degrees, in the Polar Regions the topography plays a crucial role for the illumination conditions. In fact, areas at relatively high altitude can experience continuous periods of illumination (of several months), whereas some crater bottoms are always in shadow. The Moon landing mission scenario shall aim at landing sites (LS) with such characteristics. In such scenario, the navigation system can encounter highly varying illumination conditions, with low Sun elevation angle in the South Pole region and large shadow areas in the image. Figure 6 shows the assumed nominal phases for a Lunar landing mission. It can be seen how on-board navigation operates in two modes, absolute navigation mode [1,14] and relative navigation mode, discussed in this paper. In the following, the landing phases are detailed: • Parking orbit (PO): the spacecraft is in its orbital motion around the Moon and performs absolute navigation with respect to the lunar fixed frame. The parking orbit is assumed to be a circular orbit with constant altitude between 250 and 100 km. The altitude 100 km is the value assumed in the paper for the lunar pinpoint landing scenario. Absolute navigation is performed. In traditional algorithms, lunar maps and craters catalog are used to determine position and velocity. • Maneuver (DM): the spacecraft performs a tangential burn to lower the orbit perigee, inserting itself into an elliptical orbit. The lower the perigee, the lower the overall amount of fuel required for the landing maneuver. At the same time, the terrain topography pose a safety requirement on the minimum altitude of the perigee. Moreover, 15 km is a generally accepted value and is adopted as nominal value in this study. In an inertial reference frame, at the touchdown of the lander, horizontal velocity is required to match the velocity of the terrain due to the rotation of the Moon around its axis. Assuming a landing at the South Pole, the inertial tangential velocity of the lunar surface at the landing site can be neglected, in first approximation, and the overall landing trajectory is assumed to be planar (in the inertial reference frame). There are several strategies in the selection of sensors suite for relative and absolute navigation. The landing is performed using a camera with 40 • field of view, a 1024 × 1024 pixel sensor, and an altimeter. As summary, the nominal trajectory used for testing the navigation system is shown in Figures 7 and 8. The generation of synthetic images was performed using a custom rendering pipeline based on PANGU software, as shown in Figure 9 [23,24].

Feature Detection and Matching
This paper does not focus on the features detection and matching pipeline, as a robust and common method is used to retrieve these elements from the images. The ORB detector implemented in OpenCV [25] is used to process images, while a brute-force matching approach based on Hamming distance is used to perform the frame-to-frame features matching. A sample frame-to-frame matching is shown in Figure 10. The feature extraction and description procedure implements the following: Aerospace 2022, 1, 0 13 of 20 altimeter. As summary, the nominal trajectory used for testing the navigation system is shown in Figures 7 and 8. The generation of synthetic images was performed using a custom rendering pipeline based on PANGU software, as shown in Figure 9 [22,23].

Feature Detection and Matching
This paper does not focus on the features detection and matching pipeline, as a robust and common method is used to retrieve these elements from the images. The ORB detector implemented in OpenCV [24] is used to process images, while a brute-force matching approach based on Hamming distance is used to perform the frame-to-frame features matching. A sample frame-to-frame matching is shown in Figure 10. The feature extraction and description procedure implements the following: # # Feature extraction # --------------------kpkm1 = orb . detect ( imgOrbkm1 , None ) kpk = orb . detect ( imgOrbk , None ) # compute descriptors kpkm1 , deskm1 = orb . compute ( imgOrbkm1 , kpkm1 ) kpk , desk = orb . compute ( imgOrbk , kpk ) Once features are extracted from two consecutive images, the brute-force matcher object yields a set of corresponding features between the two consecutive frames. The set of matched features is ordered based on their Hamming distances, meaning that the most confident matches are ranked higher than less reliable ones. In coding, the instructions are as follows: Once features are extracted from two consecutive images, the brute-force matcher object yields a set of corresponding features between the two consecutive frames. The set of matched features is ordered based on their Hamming distances, meaning that the most confident matches are ranked higher than less reliable ones. In coding, the instructions are as follows: The average number of tracked features in consecutive images is roughly 500. Nevertheless, the number of features influences the computational complexity of the filter. In particular, the size of the epipolar constraint in Equation (16) scales linearly with the number of features, as demonstrated by Equation (13). For the sake of limiting the computational burden, a maximum number of matched features has been set for the presented simulation tests. In particular, two cases have been run with n f = 20 and n f = 100, alternatively. The reader is suggested to refer to OpenCV documentation for the details of the implementation [24].

Trajectory Estimation
The navigation algorithm has been implemented in Matlab-Simulink software. A python interface is used to link the output of the image processing to the input of the The average number of tracked features in consecutive images is roughly 500. Nevertheless, the number of features influences the computational complexity of the filter. In particular, the size of the epipolar constraint in Equation (16) scales linearly with the number of features, as demonstrated by Equation (13). For the sake of limiting the computational burden, a maximum number of matched features has been set for the presented simulation tests. In particular, two cases have been run with n f = 20 and n f = 100, alternatively. The reader is suggested to refer to OpenCV documentation for the details of the implementation [25].

Trajectory Estimation
The navigation algorithm has been implemented in Matlab-Simulink software. A python interface is used to link the output of the image processing to the input of the navigation module. The sample trajectory used for testing is the one described in Section 5.1. The proposed navigation system provides an estimate of the translation states only, as in [1], but relies on attitude determination to identify the camera pointing direction: then, attitude estimation errors could have an impact on navigation performances. The spacecraft rotational dynamics are not simulated: the navigation camera is assumed to maintain a nominal nadir pointing, while a Gaussian noise with standard deviation σ = 1°is added on the three Euler angles to represent attitude determination errors. The altimeter measurements are synthetically generated by randomly perturbing the vertical groundtruth position with a Gaussian noise. A standard deviation of 1 % of the current altitude is assumed, reflecting the actual behavior of the laser altimeter technology. The whole model is implemented in a Matlab-Simulink environment, with altitude measurements generated at frequency 8 Hz. The navigation filter parameters are listed in Table 1. The initial condition is given as a perturbed state with respect to the groundtruth with a Gaussian noise with variance σ 2 = 10 4 m 2 , which represents a potential initialization error when the navigation switches from absolute to relative mode. The estimation error remains bounded with a mean horizontal error and vertical error of less than 200 and 100 m, respectively, representing less than the 0.5% of the position vector magnitude. As showed in Figure 11, the number of matched features pairs that is processed by the algorithm has an impact only on the estimated covariance. In other words, this means that the number of features mainly influences the robustness of the filter without strongly affecting the accuracy of the estimation. This is partially expected, as each epipolar equation is a standalone geometric constraint. In principle, a minimum number of correspondences is needed for static (i.e., without navigation filters) pose estimation, as discussed in Sections 1.1 and 1.2; nevertheless, increasing the number of matches affects the covariance estimate at the cost of higher computational burden.

Monte Carlo Analysis
To analyze the robustness of the algorithm, a Monte Carlo analysis is performed to explore the uncertainties that typically affect the landing scenario and the filter tuning. In particular, the uncertainties distribution around the nominal values (cfr. Table 1) of the analyzed variables are reported in Table 2. The analyzed variables are deemed to be the most uncertain in a realistic deployment. They consist of the position and velocity initial conditions x 0 = [ r 0 , v 0 ] , initial covariance matrix P 0 , and the elementary measurement covariance matrix R k for features localization and altimeter. The aim of the Monte Carlo analysis is twofold: 1.
Objective 1: to analyze global results in terms of the landing trajectory mean error to estimate product confidence levels.

2.
Objective 2: to analyze the estimation error throughout the trajectory to evaluate filter robustness and convergence.
The number of runs of the statistical analysis is selected to represent the global response of the system to the assumed uncertainties, according to Hanson [26]. The goal of the analysis is to estimate the mean estimation error value through the landing trajectory. Therefore, in terms of statistical analysis, the goal is to estimate the mean estimation error bound within a box at ∼99.73% of probability. It is important to note that, in the order statistics approach (Objective 1), the necessary number of samples does not depend on how many uncertain variables are varied. The standard error of the mean is used to assess the accuracy of the Monte Carlo analysis. The standard error of the mean is calculated using the sample standard deviation: The standard error of the sample mean is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean. As reported in Table 3, the standard error of the mean is lower than 1 m using 100 samples, which is considered as acceptable. Table 3 reports the mean and the ∼99.73% probability with 3σ bound for the filter performance.  Figures 12 and 13. The 3σ bound derived from the filter covariance estimation is compared with the 3σ bound at each time step of the Monte Carlo population. The filter covariance estimate is larger than the Monte Carlo variance, meaning that the filter is being conservative in its performance. Furthermore, the variables taken into account in the Monte Carlo analysis are a subset of the uncertainties that the filter is experiencing in the simulated scenario (e.g., variable altimeter measurements errors as a function of the altitude). The conservative behavior is favorable for the robustness of the filter. The vertical error shows an estimated covariance lower than the Monte Carlo bound for a limited portion of the trajectory. This may be due to the selected nominal case to which the filter covariance estimation refers to. Nevertheless, in Figure 13b, one can see that the filter covariance estimation still bounds correctly the estimation error of the whole Monte Carlo samples except for few error spikes. Figures 12b and 13b show a zoom of the overall plot to highlight the behavior of the different Monte Carlo samples.

Computational Time: Comparison with Essential Matrix Pose Recovery
This work presents a filter formulation that uses the feature correspondences directly in the update step; as depicted in Figure 1, a valid alternative is to pre-process the matched features to retrieve the pose from perspective geometry relationships. In particular, as discussed in Section 1.2, the 5-point algorithm can be used to find the essential matrix from a set of features correspondences. Once the essential matrix is determined, it can be decomposed using SVD decomposition. In general, four possible poses exists for a given E. Thus, the algorithm requires to verify possible pose hypotheses by performing the chirality check. The chirality check consists of the verification that the triangulated 3D points have positive depth [27]. By decomposing E, one only obtains the direction of the translation, so the function returns the translation unit vector. The altimeter measurement is used to scale the pose estimation.
To be consistent in the comparison, taking as reference Figure 1, only the relevant steps need to be timed. They consist of: • IEKF: -Implicit epipolar constraint (Section 4.2); -Correction step (Section 4.3): this step would be executed in all the presented methods, but its execution time is influenced by the size of the Kalman gain matrix, measurement matrix, and measurement covariance matrix. Hence, it is reasonable to maintain its contribution in the required computational time for the IEKF where the sizes of those matrices are not negligible.
The summary results for the computational times are reported in Table 4. The algorithms have been run in a Intel© Core™ i7-6500U CPU @ 2.50 GHz, thus the numbers in Table 4 are only for relative comparison and they are not representative of any on-board implementation. The histogram for the elapsed times for the 5-point + SVD decomposition method is reported in Figure 14. It can be noted that the distribution is quite spread with a significant standard deviation (cfr. Table 4) due to the preemptive RANSAC iterative algorithm. On average, the decrease in computational time using IEKF is roughly 75%.

Conclusions
This paper presented a method to perform relative (or local) terrain navigation using frame-to-frame features correspondences and altimeter measurements. In summary, the proposed image-based approach relies on the implementation of the implicit extended Kalman filter, which uses the epipolar constraint as the implicit measurement function. The navigation system relies also on a fast cycle with altimeter updates. This method allows to feed the navigation filter with the features coordinates directly to be fused with the altimeter measurements, without employing the pre-processing algorithms to retrieve the relative pose from features correspondences. For testing purposes, the altimeter acquisition was set to f alt = 8 Hz, whereas the images acquisition f opt = 1 Hz. Moreover, an extrapolation method has been developed to incorporate the intrinsic delay of the image processing routine, which was set to 1 s. The algorithm has been tested on a sample landing trajectory delivering good results in terms of position estimation with respect to the pinpoint landing location. A systematic numerical testing campaign is foreseen to assess the robustness of the navigation approach.