Crater Detection and Recognition Method for Pose Estimation

: A crater detection and recognition algorithm is the key to pose estimation based on craters. Due to the changing viewing angle and varying height, the crater is imaged as an ellipse and the scale changes in the landing camera. In this paper, a robust and efﬁcient crater detection and recognition algorithm for fusing the information of sequence images for pose estimation is designed, which can be used in both ﬂying in orbit around and landing phases. Our method consists of two stages: stage 1 for crater detection and stage 2 for crater recognition. In stage 1, a single-stage network with dense anchor points (dense point crater detection network, DPCDN) is conducive to dealing with multi-scale craters, especially small and dense crater scenes. The fast feature-extraction layer (FEL) of the network improves detection speed and reduces network parameters without losing accuracy. We comprehensively evaluate this method and present state-of-art detection performance on a Mars crater dataset. In stage 2, taking the encoded features and intersection over union (IOU) of craters as weights, we solve the weighted bipartite graph matching problem, which is matching craters in the image with the previously identiﬁed craters and the pre-established craters database. The former is called “frame-frame match,” or FFM, and the latter is called “frame-database match”, or FDM. Combining the FFM with FDM, the recognition speed is enabled to achieve real-time on the CPU (25 FPS) and the average recognition precision is 98.5%. Finally, the recognition result is used to estimate the pose using the perspective-n-point (PnP) algorithm and results show that the root mean square error (RMSE) of trajectories is less than 10 m and the angle error is less than 1.5 degrees.


Introduction
Soft landing on the surface of a planet is one of the key technologies for space exploration missions and requires high positioning accuracy. The pose-estimation method based on inertial navigation has accumulated errors, and the landing error is within a few kilometers [1], while high-precision landing missions require landing errors of less than 100 m or even 10 m. Terrain relative navigation (TRN) [2] uses an optical camera to capture images and calculates information about the pose of the spacecraft. The optical flow method or feature-tracking method is a kind of TRN but cannot give the absolute pose [3]. Another method of TRN matches the image obtained by the image sensor with the image of the vicinity of the flight area and the stored terrain feature database to calculate the absolute pose. Craters are widely present on the surface of planets and are an ideal topographic feature. The NEAR mission successfully proved that high-precision pose information can be obtained using crater features [4]. Thus, crater-based TRN is possibly one of the most suitable solutions for pose estimation in soft landing missions.
Crater-based TRN has three parts: processing sensor input, i.e., crater detection; matching sensor data with a database containing topographical features, a process called recognition; and, finally, using correlated data to calculate pose. In this paper, crater image processing methods for pose estimation are discussed, including crater detection and recognition, both of which are critical to final positioning accuracy. After recognition, perspective-n-point (PnP) [5] is used to solve the pose. During landing to the ground, the craters imaged in the landing camera are always ellipse and the scale of the craters changes. The detection and recognition method must be adaptable to these changes.
The accuracy and robustness of crater detection are sensitive to lighting conditions, viewing angle, and degree of noise. Large changes in crater scale, overlap, and nesting of different craters have challenged detection algorithms. At present, neural networks have been successfully applied in many computer-vision tasks; they can deal with complex scenes and thus comprise a possible solution. In the literature, one type of research uses a convolutional neural network to segment an image, separate the ellipse from the background, cluster the ellipse through a clustering algorithm, extract the edge of the crater, and finally fit the ellipse equation. This solution can obtain the complete information of the crater, which is beneficial to the recognition algorithm, but the detection algorithm is complicated. Klear [6] uses the UNet [7] network to predict the probability that each point may belong to the edge of the crater and detects the crater through the edge-fitting method. Silburt and Christopher [8] also use the UNet network, but the image used in their work is a digital elevation map (DEM), and there is no need to consider changes in lighting conditions. The network structure of Lena [9] is the same as that proposed in Ref. [8]. Lena expanded the data and used DEM images and optical images to analyze the effects of lighting factors in detail, making the network more robust to lighting. Another solution is using an object-detection neural network to detect the crater target and obtain the crater circumscribed rectangle. When the flying height is much larger than the radius of the crater, the rectangular center can be approximated as the crater center [10]. The algorithm complexity of this method is lower and using bounding boxes can improve the matching accuracy.
Recognition is the process of associating the crater in an image with the crater in the pre-established crater database, which can be divided into recognition with prior information and "lost in space". The recognition algorithm should have strong robustness and cannot occupy a large amount of time for the entire navigation task.
If the prior pose existing error is known, the craters in the database can be projected to the current camera field of view, using an error function between the projected image and the real image to calculate the pose through optimization. Leroy et al. [11] used the affine transformation relationship of two crater sets on small celestial bodies for matching, which requires accurate prior knowledge. Clerc et al. [12] combined this type of recognition method with the random sample consensus (RANSAC) method to further improve its accuracy. Cheng et al. [13] used the affine invariant feature of the ellipse to select three possible matching pairs of craters and calculate their attitudes and then projected the craters from the database onto the image to verify the guessed matching pairs. This method does not require prior information but will take a significant amount of time to verify the position. In Refs. [14,15], it was assumed that the camera shoots vertically to the ground, which can realize absolute pose estimation without prior information. In addition to matching craters in the image with the pre-established database, it can also match images between sequences, which can provide more information and accelerate recognition. The matching methods between sequence images include pattern matching and image correlation-based matching [16], among others.
In a crater-based terrain navigation system, the recognition algorithm is related to the output of the detection algorithm, and many researchers study the detection and recognition algorithms separately. With CraterIdNet [17], a fully convolutional network was proposed for end-to-end crater detection and recognition. In the present article, the region proposal net [18] is used to detect the crater object, assuming that the crater is circular, and then a grid pattern is generated for each crater input, which is then input into the classification network. For generating the grid pattern, CraterIdNet assumes that the camera shoots vertically to the ground and must input the height information, which cannot be used to calculate posture and limits the scope of application. By fusing the information of sequence images, projecting the pre-established database by the previous pose, or matching the craters in the before and after, our method does not need to consider whether the camera is shooting vertically.
Overall, the contribution of this research is to comprehensively study the crater detection and recognition algorithm, demonstrate a robust and efficient crater detection and recognition system, and, at the same time, fuse the information between the sequence images into the system, consisting of two stages: stage 1 for detection and stage 2 for recognition. In stage 1, adopting the idea of an anchor-free object detection neural network, a dense point crater detection net (DPCDN) with multiplying anchor points is designed. The dense anchor point of the DPCDN is beneficial to the detection of small and dense craters. DPCDN outputs the circumscribed rectangle of the crater as the input of the recognition algorithm. In stage 2, taking the encoded features and intersection over union (IOU) of craters as weights, a bipartite graph matching algorithm is used to match crater output by the DPCDN in the image with the previously identified craters and the preestablished craters database, which can achieve real-time speed on CPU and high precision.
The rest of this paper is organized as follows. In Section 2, the method proposed in this paper is introduced, i.e., the dense point crater detection network in Section 2.1 and crater recognition algorithm in Section 2.2. In Section 3, the experimental results on crater detection and recognition as well as pose estimation are described. In Section 4, the results are discussed, and conclusions and future research plans are presented in Section 5.

Methodology
The workflow of the detection and recognition system is shown in Figure 1, and it consists of two stages: stage 1 for crater detection and stage 2 for recognition. In stage 1, to effectively detect small-scale craters and dense crater scenes, a crater detection network with multiplier anchor points is proposed, i.e., the "dense point crater detection network," or DPCDN. In stage 2, using DPCDN output, the Kuhn-Munkres (KM) algorithm [19,20] is used to quickly match the craters between the frames, which is beneficial to make full use of the information of the sequence images and the craters between the image frames and the pre-established database. The image taken at time k is input into DPCDN, extracted by the fast extraction layer (FEL) and the feature is enhanced by the feature pyramid network (FPN [21]) structure. Then the detection head with dense point is used for detecting craters. The FEL and the dense point detection head we designed will be described in detail. Then the crater bounding boxes are matched with the crater in the pre-established database to complete the recognition, which is called "frame-database matching", or FDM. Another matching route is matching frame k's craters with frame k − 1 s craters, and if craters in k − 1 have been identified, frame k's craters can be also identified. This process is called "frame-frame matching," or FFM. The latter is faster than the former.
DPCDN is a one-stage, anchor-free target detection network, and its output is a bounding box circumscribed by the crater. Compared with the nearest neighbor matching using the center point, using bounding boxes can improve the matching accuracy. In the case of dense craters, the distance between the center points is more likely to be mismatched. Figure 2 shows this result intuitively.

Figure 1.
Crater detection and recognition system workflow. The whole workflow consists of two stages. In stage 1, a dense point crater detection network obtains craters in the frame k and frame k-1. Then, in stage-2, we use the KM algorithm to match k's craters with k-1′s craters or the pre-established database.
DPCDN is a one-stage, anchor-free target detection network, and its output is a bounding box circumscribed by the crater. Compared with the nearest neighbor matching using the center point, using bounding boxes can improve the matching accuracy. In the case of dense craters, the distance between the center points is more likely to be mismatched. Figure 2 shows this result intuitively.

Stage 1: Crater Detection
Inspired by the idea of using a convolutional neural network to detect craters [18], a dense point crater detection network (DPCDN) was designed. The architecture of the proposed network is shown in Figure 3. The crater detection algorithm used for navigation should be efficient and robust. A fast feature-extraction layer (FEL) was designed as the backbone. In the anchor point design, each position of the low-level feature map (P2) is associated with multiple anchor points to detect dense small-scale craters.  DPCDN is a one-stage, anchor-free target detection netw bounding box circumscribed by the crater. Compared with the ne using the center point, using bounding boxes can improve the m case of dense craters, the distance between the center points i matched. Figure 2 shows this result intuitively.

Stage 1: Crater Detection
Inspired by the idea of using a convolutional neural networ dense point crater detection network (DPCDN) was designed. Th posed network is shown in Figure 3. The crater detection algori should be efficient and robust. A fast feature-extraction layer (F backbone. In the anchor point design, each position of the lowassociated with multiple anchor points to detect dense small-sca

Stage 1: Crater Detection
Inspired by the idea of using a convolutional neural network to detect craters [18], a dense point crater detection network (DPCDN) was designed. The architecture of the proposed network is shown in Figure 3. The crater detection algorithm used for navigation should be efficient and robust. A fast feature-extraction layer (FEL) was designed as the backbone. In the anchor point design, each position of the low-level feature map (P2) is associated with multiple anchor points to detect dense small-scale craters.
FEL is used to quickly extract the features of the input image, as shown in the upper part of Figure 3a, and has four convolutional layers. The stride between the previous convolution layer and the next is 2. Both conv1 and conv2 are followed by a CReLU [22] activation function:  FEL is used to quickly extract the features of the input image, as shown in the upper part of Figure 3a, and has four convolutional layers. The stride between the previous convolution layer and the next is 2. Both conv1 and conv2 are followed by a CReLU [22] activation function: The detail in the FEL is shown in Figure 4a. Shang et al. [22] proved that the weights learned by the low-level network are approximately symmetrical; that is, for the weights of 2C channels, those of the first C channels and the last C channels are approximately in the opposite relationship. Using this, the activation function can double the number of output channels by simply concatenating negated outputs before applying ReLU.
After the features extracted by the FEL are enhanced by the feature pyramid network (FPN [21]), the feature maps P ∈ R × × , where i is in {2,3,4,5} for detection, are obtained. Each location (x,y) on the feature map P is mapped to the input image (xs + , ys + ) and s is the total stride until the layer is reached. The detection layer outputs four distances (t, l, b, r) from the center to the four sides of the bounding box. To suppress low-quality detected bounding boxes, the "center-ness" of the location is used and the low center-ness output is ignored. The center-ness reflects how close the anchor point is to the target center, and its formula is The detail in the FEL is shown in Figure 4a. Shang et al. [22] proved that the weights learned by the low-level network are approximately symmetrical; that is, for the weights of 2C channels, those of the first C channels and the last C channels are approximately in the opposite relationship. Using this, the activation function can double the number of output channels by simply concatenating negated outputs before applying ReLU.  The center-ness is useful for large-scale craters because the low-quality bounding boxes can be filtered by non-maximum suppression (NMS). However, when the crater scale is small, there are very few locations associated with it. Only one location can be used to detect small-scale craters, as shown in Figure 4c. To avoid this situation, the anchor point is made dense. For each location (x, y) in feature maps, it is made dense to x + , + , i ∈ Z and i s, so that each location can be associated with more than one location in the input image. After being made dense, more effective output bounding boxes can be used because of more anchor points in the center of the crater. After the features extracted by the FEL are enhanced by the feature pyramid network (FPN [21]), the feature maps P i ∈ R H×W×C , where i is in {2, 3, 4, 5} for detection, are obtained. Each location (x,y) on the feature map P i is mapped to the input image xs + s 2 , ys + s 2 and s is the total stride until the layer is reached. The detection layer outputs four distances (t, l, b, r) from the center to the four sides of the bounding box. To suppress low-quality detected bounding boxes, the "center-ness" of the location is used and the low center-ness output is ignored. The center-ness reflects how close the anchor point is to the target center, and its formula is The center-ness is useful for large-scale craters because the low-quality bounding boxes can be filtered by non-maximum suppression (NMS). However, when the crater scale is small, there are very few locations associated with it. Only one location can be used to detect small-scale craters, as shown in Figure 4c. To avoid this situation, the anchor point is made dense. For each location (x,y) in feature maps, it is made dense to x + k i s , y + k i s , i ∈ Z and i < s, so that each location can be associated with more than one location in the input image. After being made dense, more effective output bounding boxes can be used because of more anchor points in the center of the crater.

Stage 2: Crater Recognition
The purpose of crater recognition is to obtain information, such as the location of the crater in the pre-established database and its three-dimensional coordinates in the stellar coordinate system. Either the crater can be directly matched with the database, or the crater detected in the kth frame can be matched with the craters that have been successfully matched in the image of the previous frame. In this paper, the former is called "frame-database match", or FDM, and the latter is called "frame-frame match", or FFM. To achieve real-time speed, for both matching strategies the KM algorithm is employed for data association because it is effective and fast. In feature space, an encoding method to measure the distance between the craters to be matched is designed, and in physical space IOU is used as the distance. Two distance measures improve the robustness of the algorithm. Figure 5 shows the workflow of the recognition algorithm. The bounding boxes detected in the kth frame are denoted  The bounding boxes detected in the kth frame are denoted where k is the sequence number of this frame, C k i the ith crater of the kth frame, and x, y, w, and h the coordinates of the upper left-hand corner of the target, the length, the width of the target, respectively.
The FFM is the matching of the sets Θ k and Θ k−1 . To match the frame to the database, the pre-established database must be projected to the current field of view according to the pose Ω k = Γ Ω, pose k , where Γ represents the camera projection equation, and the FDM is to match Ω k and Θ k . Both use the KM match algorithm, which is a weighted bipartite graph matching algorithm. The weight in KM algorithm is the distance between craters, including IOU and features: When there is a large overlap area, the IOU distance metric is used to match the crater, and when the IOU fails, the distance between the encoding features is used for matching because the features are more robust to noise.
The encoding method proceeds as follows and is shown in Figure 6.

Frame-Frame Match
"FFM" matches the crater detected in the k th frame Θ with the craters Θ = Θ { , ,…, } that have been successfully matched in the image of the previous frame. There are three statuses of the crater in the Θ′, namely "new" crater S , "mature" crater S , and "old" crater S . The status is updated by the matching result. If a crater is detected for the first time, i.e., the crater is not associated with any previous craters, then the crater will be added to the set Θ′, and the status is "new." If a crater in the Θ′ is associated with a crater for N consecutive frames, its status is updated to "mature." If a crater lost matching for M consecutive frames, the status changes to "old" and will be deleted later. The state-transition process is shown in Figure 7a. ( 1) ) qe q e θ ∈ + S1:"new" S2:"mature" S3: "old" S1 S2 S3 (1) Initialize the feature vector v = (0, 0, . . . , 0) T ∈ R n×1 , where n = 2π/e , and e is the discrete factor.
(2) Using the "constellation" composed of the crater to be matched and m surrounding craters, calculate the angle θ between the craters, according to θ ∈ [qe, (q + 1)e), v[q] = v[q] + 1; this process is discretization, and discretization makes the feature more robust.

Frame-Frame Match
"FFM" matches the crater detected in the kth frame Θ k with the craters Θ = Θ {0,1,...,k−1} that have been successfully matched in the image of the previous frame. There are three statuses of the crater in the Θ , namely "new" crater S 1 , "mature" crater S 2 , and "old" crater S 3 . The status is updated by the matching result. If a crater is detected for the first time, i.e., the crater is not associated with any previous craters, then the crater will be added to the set Θ , and the status is "new". If a crater in the Θ is associated with a crater for N consecutive frames, its status is updated to "mature." If a crater lost matching for M consecutive frames, the status changes to "old" and will be deleted later. The state-transition process is shown in Figure 7a. are three statuses of the crater in the Θ′, namely "new" crater S , "mature" crater S , and "old" crater S . The status is updated by the matching result. If a crater is detected for the first time, i.e., the crater is not associated with any previous craters, then the crater will be added to the set Θ′, and the status is "new." If a crater in the Θ′ is associated with a crater for N consecutive frames, its status is updated to "mature." If a crater lost matching for M consecutive frames, the status changes to "old" and will be deleted later. The state-transition process is shown in Figure 7a. To calculate the IOU distance, it is necessary to predict the position of the crater in Θ′ under the current field of view, which is shown in Figure 8. In this paper, the Kalman filter [23] is used to predict it. The state of a crater in Θ′ is expressed as Among them, x and y are the coordinates of the center of the target frame, r is the aspect ratio, h is the height, the modeling is a linear motion model, and the remaining variables are the corresponding speeds. The prediction equations are where x′ is the predicted value of the state quantity, F is the state-transition matrix, P is the predicted value of the covariance matrix, and Q is the system error. To calculate the IOU distance, it is necessary to predict the position of the crater in Θ under the current field of view, which is shown in Figure 8. In this paper, the Kalman filter [23] is used to predict it. The state of a crater in Θ is expressed as = + (10) where z is the mean vector of the bounding box, excluding the speed change, H is the measurement matrix, R is the noise matrix, and K is the gain of the Kalman filter. Summarizing the above process, the FFM algorithm executes as follows. 1. The Kalman filter calculates the predicted value C of the crater C . Calculate the IOU between C and C . 2. Encode the feature of craters C and calculate the distance between features. 3. Input the distance into the KM algorithm, matching craters by IOU first, and match the remaining unmatched craters using the distance of the feature. 4. Use successfully matched craters to update the Kalman-filter parameters and update the state of craters in Θ′. Among them, x and y are the coordinates of the center of the target frame, r is the aspect ratio, h is the height, the modeling is a linear motion model, and the remaining variables are the corresponding speeds. The prediction equations are where x is the predicted value of the state quantity, F is the state-transition matrix, P is the predicted value of the covariance matrix, and Q is the system error. The newly arrived detection frame at the kth moment and the matched craters in Θ are matched through the KM algorithm, and the matching result is input as the observation value into the update equation of the Kalman filter: Remote Sens. 2021, 13, 3467 9 of 19 x k+1 = x + Ky (10) where z is the mean vector of the bounding box, excluding the speed change, H is the measurement matrix, R is the noise matrix, and K is the gain of the Kalman filter. Summarizing the above process, the FFM algorithm executes as follows.
1. The Kalman filter calculates the predicted value C k−1 of the craterĈ k−1 . Calculate the IOU betweenĈ k−1 and C k .

2.
Encode the feature of craters C k and calculate the distance between features.

3.
Input the distance into the KM algorithm, matching craters by IOU first, and match the remaining unmatched craters using the distance of the feature.

4.
Use successfully matched craters to update the Kalman-filter parameters and update the state of craters in Θ .

Frame Database Match
The pre-established database contains the three-dimensional coordinates X = x w i , y w i , z w i , where w represents the world coordinate system (the celestial Cartesian coordinate system). In the first frame, assuming that a rough pose P initial is available and the camera internal parameter K is known, one must project the craters in the database to the pixel coordinates: It is necessary to calculate the circumscribed rectangle of the ellipse projected from the crater on the pixel plane. One takes m points on the circle and projects them under the pixel plane to obtain the ellipse equations and corresponding circumscribed rectangles of all craters in the database under the current field of view. This crater set is denoted Θ global . The IOU distance and feature encoding distance of the target frame in Θ global and Θ k are calculated and input into the KM algorithm to complete the match. This process is similar to FFM.
During navigation, the recognition algorithm first uses FFM, and when the number of successful matches is less than the threshold, FDM is used. The input pose of FDM is the predicted value of the output pose of the previous pose through the Kalman filter. Finally, the matching result is input into the EPnP algorithm to obtain the pose of the current frame. On one hand, the pose is output directly; on the other hand, as FDM input, it rematches the crater of the current frame with the database and updates the state of craters in Θ .

Experimental Dataset
At present, many researchers have studied crater detection methods and established a crater database of the Moon, Mars, and other planets through artificial or automatic detection methods [8,[24][25][26][27]; herein, the Bandeira Mars Crater Database is used for experimental data ( Figure A1). The Bandeira database contains a length of 40 km × 59 km, covering an area of 2360 km 2 , which is located between Mars −47.66 E/−48.68 E and North latitude 7.28 N/7.95 N. This area contains a huge number of craters. There are a total of 3050 handmarked craters in the database, with a radius between 20 and 5000 m. This dataset has been used in many studies to facilitate the comparison of method performance [17].
Under the Robotic Operating System (ROS), the abovementioned Mars area was input to the gazebo [28] physical simulation platform, and the PX4 [29] flight control system equipped with a camera was used to obtain the flight attitude and camera data, as shown in Figure 8.
Using gazebo, the ascent, descent, and level flight of the spacecraft were simulated. The sequence rate is 10 frames per second (fps). The experiment used four sequences, i.e., Seq1-4. The shooting area of different sequences is different, as is the average crater density. Figure 9 shows their ground-truth trajectories. The image resolution used in the experiment was 1024 × 1024. The flight altitude range of simulations Seq1-4 was between 600 and 1000 m. equipped with a camera was used to obtain the flight attitude and camera data, as in Figure 8.
Using gazebo, the ascent, descent, and level flight of the spacecraft were sim The sequence rate is 10 frames per second (fps). The experiment used four sequen Seq1-4. The shooting area of different sequences is different, as is the average crat sity. Figure 9 shows their ground-truth trajectories. The image resolution used in periment was 1024 × 1024. The flight altitude range of simulations Seq1-4 was b 600 and 1000 m.

Training Details
During training, the Bandeira database and images taken by the high-resolut reo camera (HRSC) in the Express spacecraft were used to form a dataset. There a images in the training set and 3000 in the test set. The resolution of each image wa 600. The strategies of data augmentation include random rotation, random flips, r shift, and random Gaussian blur.
The strides of the feature-extraction layer are 4, 8, 16, and 32. The positive sam the training set were allocated to different detection layers according to their sizes

Training Details
During training, the Bandeira database and images taken by the high-resolution stereo camera (HRSC) in the Express spacecraft were used to form a dataset. There are 9000 images in the training set and 3000 in the test set. The resolution of each image was 600 × 600. The strategies of data augmentation include random rotation, random flips, random shift, and random Gaussian blur.
The strides of the feature-extraction layer are 4, 8, 16, and 32. The positive samples in the training set were allocated to different detection layers according to their sizes. P3-P6 were associated with the size ranges of the positive samples, i.e., [−1, 12), [12,32), [32,64), [64, 99999) respectively. The stochastic-gradient-descent optimizer was used, and the parameters were randomly initialized by the Kaiming strategy [30]. The training epochs' number was 30 and the learning rate was 0.01.

DPCDN Results
In this subsection, the detection effects of DPCDN are compared with different networks (Urbach [31], Ding [32], CraterIdNet, and Bandeira). The results are shown in Table 1, which reports the average F1 scores of the different regions, including the West, Central, and East regions. The smallest crater instance detected by DPCDN is 6 pixels, which is smaller than DPCDN because of the dense anchor point. The DPCDN performance was better than that of the other networks. To facilitate the comparison with other methods, only craters larger than 16 pixels are compared in the present work because the previous method cannot detect small craters.
The FEL in DPCDN improved the detection speed. Then, the performance of DPCDN was compared under different backbone networks; in addition, the parameters, operating speed, precision rate, recall rate, and F1 score were compared. Operating time refers to the time from image input to network forward propagation and output NMS operation. Table 2 shows the results. The backbone network includes Vgg16 [33], resnet18 [34], and the FEL proposed in this paper. The F1 scores of each backbone all exceed 97%, and the number of parameters using the FEL is much lower than that of Vgg16 and Resnet18, indicating that the FEL has the function of compressing parameters without losing performance. The design of dense anchor points effectively improves the detection of dense, small craters, which is why DPCDN is better than other methods, and it also improves detection precision. The above experiment uses F1 score when IOU = 0.5 for evaluation. However, IOU = 0.5 has the problem of insufficient accuracy in relative terrain navigation based on the crater. IOU indicates how close the detection bounding box is to the ground truth. Average Precision (AP) value is used to evaluate the performance of using the dense anchor points. The AP value is the area enclosed by the precision-recall curve (PR curve) and the horizontal axis of the coordinate axis under a certain threshold. Figure 10 shows the PR curve with and without dense anchor points when IOU = 0.8. The interval between the two curves in the blue box represents the difference between the two methods. It can be seen from the figure that, without dense anchor points when the recall rate is 0.87, the accuracy is close to 0, while the accuracy of the network with dense anchor points is still 0.9 at the same recall value. Using the 0.5-0.95 threshold proposed by MSCOCO [35], the mean AP (mAP) value is used as the evaluation method. The mAP of the network without dense anchor points is 0.767, and that of the network with dense anchor points is 0.855. The performance improved by 11.47%.
Remote Sens. 2021, 13, x FOR PEER REVIEW seen from the figure that, without dense anchor points when the recall rate is 0 accuracy is close to 0, while the accuracy of the network with dense anchor points 0.9 at the same recall value. Using the 0.5-0.95 threshold proposed by MSCOCO [ mean AP (mAP) value is used as the evaluation method. The mAP of the network w dense anchor points is 0.767, and that of the network with dense anchor points i The performance improved by 11.47%. Using the same structure, DPCDN was tested on the lunar database after ret to verify the robustness of the network to different scenarios. Figure 11 shows the mental results. The image is from the 120-m-resolution scientific data of the Ch Using the same structure, DPCDN was tested on the lunar database after retraining to verify the robustness of the network to different scenarios. Figure 11 shows the experimental results. The image is from the 120-m-resolution scientific data of the Chang'e-1 CCD stereo camera ( Figure A2) [36]. The data comprise an entire month using simple cylindrical projection. The longitude and latitude range were chosen as [−53 • , −23 • ] and [−15 • , −8 • ], respectively. This area exhibits large changes in illumination, complex crater scenes, large-scale changes, and large differences compared to Mars scenes. The crater data used are the lunar data released by Robbins [25] in 2018. The data cover craters with a radius of more than 1 km. The F1 score is 95.6%. Figure 11 shows the results for the Moon and Mars. Using the same structure, DPCDN was tested on the lunar database after retraining to verify the robustness of the network to different scenarios. Figure 11 shows the experimental results. The image is from the 120-m-resolution scientific data of the Chang'e-1 CCD stereo camera ( Figure A2) [36]. The data comprise an entire month using simple cylindrical projection. The longitude and latitude range were chosen as [−53°, −23°] and [−15°, −8°], respectively. This area exhibits large changes in illumination, complex crater scenes, large-scale changes, and large differences compared to Mars scenes. The crater data used are the lunar data released by Robbins [25] in 2018. The data cover craters with a radius of more than 1 km. The F1 score is 95.6%. Figure 11 shows the results for the Moon and Mars. Experiments were conducted on detection precision because the identification of craters requires the use of crater position. One can simply use the center of the bounding box as the crater position. Using root-mean-square error (RMSE) to evaluate the precision, the RMSE of a total of 24,629 craters (with repeated craters and different noises) of 600 Experiments were conducted on detection precision because the identification of craters requires the use of crater position. One can simply use the center of the bounding box as the crater position. Using root-mean-square error (RMSE) to evaluate the precision, the RMSE of a total of 24,629 craters (with repeated craters and different noises) of 600 randomly generated images is 0.46 pixels, which means that DPCDN achieves pixel-level accuracy.

Recognition Validation
Random position noise and length, as well as width noise, were added to the groundtruth bounding boxes to simulate the output of the detection algorithm. First, the performance of FDM and FMM were evaluated separately, and, finally, the position estimation performance of the entire system was checked.

Validation of FDM Performance
Using bounding boxes instead of points as matching objects improves the success rate of matching and is more robust to noise. Figure 12 compares the matching accuracy of the FDM algorithm under different disturbance levels. The perturbation angle ranges from 0 • to 4 • , the step length is 0.4 • , and the position perturbation ranges from 0% to 25%. The relationship between the matching rate and noise level is shown in Figure 12. In Ref. [3], images are used to show the corresponding results, in which the angle perturbation is 4 • and the matching rate is already less than 20%, but the FDM matching rate is 58%. The matching method used in this paper is more robust than that in the literature [3] because using the IOU of bounding boxes and features information improves the matching rate. [3], images are used to show the corresponding results, in which the angle perturba 4° and the matching rate is already less than 20%, but the FDM matching rate is 58 matching method used in this paper is more robust than that in the literature [3] b using the IOU of bounding boxes and features information improves the matching

Validation of FFM Performance
Matching accuracy and error between images are used herein to evaluate FF formance. These two indicators can be used to evaluate the matching effect of the time series.
where C represents all consecutive crater sequences, C represents missed c C represents wrongly matched craters, and C represents the number of switches.

error = ∑ dist(pos − pos) N
where N represents the total number of craters to be matched, and the formula cal the average distance between the predicted value and the ground truth.

Validation of FFM Performance
Matching accuracy and error between images are used herein to evaluate FFM performance. These two indicators can be used to evaluate the matching effect of the entire time series.
where C all represents all consecutive crater sequences, C miss represents missed craters, C fp represents wrongly matched craters, and C switch represents the number of orbital switches. error = ∑ i=N i=0 dist(p os − pos) N (14) where N represents the total number of craters to be matched, and the formula calculates the average distance between the predicted value and the ground truth. Table 3 lists the accuracies and errors of FFM in different sequences. With different crater densities, the matching accuracy is between 96% and 97%. The average Gaussian noise used in the experiments is 0 and the variance is 0.5 pixels. Densities have an impact on detection speed because the time complexity of the KM algorithm used in the matching algorithm is related to the number of craters.  Figure 13 visualizes the matching results between frames, where the images of (1-a)-(2-a) are the experimental sequences discussed in this paper, and the algorithm is applied to the landing sequence of Chang'e 3. The yellow rectangles are the DPCDN detection result, and the rectangles with the same color in different images indicate the same crater.  Figure 13 visualizes the matching results between frames, where the images of (1-a)-(2-a) are the experimental sequences discussed in this paper, and the algorithm is applied to the landing sequence of Chang'e 3. The yellow rectangles are the DPCDN detection result, and the rectangles with the same color in different images indicate the same crater.

Validation of Recognition
The recognition algorithm combines the matching result of FFM and FDM. When the number of correctly matched craters is greater than four, the recognition is considered successful [17]. Because the correct recognition of more than four craters can be input into the PnP algorithm to calculate the pose, the input of the wrongly identified craters into the PnP will reduce the accuracy of position estimation and even obtain an incorrect result. In related papers, the mismatched input is called an "outlier point" and the proportion is called the outlier rate. The recognition algorithm is evaluated by accuracy in the present paper, where accuracy = 1 − outlier rate. Table 4 shows the average accuracy of different sequences, and the accuracy rate is 98.5%. Seq2 and Seq4 have more frequent motion acceleration mutations in flight, so the recognition accuracy is lower than that of Seq1 and Seq3. In the existing research, the Kalman filter is used to estimate the pose and the pose is used for crater recognition. The FDM in this paper is similar to this process. The matching between sequence images is integrated. In the FFM, the Kalman filter is used to estimate the parameters of each crater and its motion state at the same time, which accelerates the recognition speed because FDM requires additional projection operations and fitting of the ellipse equation for each crater in this view. Table 4 shows the recognition speed of FDM only and the combination of FDM and FFM. Results show that the latter's average speed is 25.22 fps, which is 1.99 times faster than the former. An Intel core-i7 processor and the pytorch libraries by python language were used in the test, and the matching threshold used in the recognition algorithm was 10.

Pose-Estimation Experiment Results
The above-described experiments verify the performance of the detection and recognition algorithms. The error between the ground-truth pose and the pose estimated by the proposed system were tested next. The crater detection and recognition system proposed requires only the input of the initial pose of the first frame; no additional input is required.
The pose estimated by the crater using PnP is projected onto the image sequence, and the result is shown in Figure 14. The projection result is consistent with the real crater ellipse, indicating that the pose estimation is effective.  The sequences are input into the detection and recognition system, and the estimated pose is the output. Figure 15 shows the estimated and ground-truth poses. The relative pose error (RPE) and absolute trajectory error (ATE) are then computed, and the results are shown in Table 5. The RPE measures the local accuracy of the trajectory over a fixed time interval and ATE compares the absolute distances between the estimated and ground-truth trajectories [37]. Seq3 tests the situation in which the spacecraft is flying downwards approximately vertically, which is simpler, and the pose-estimation accuracy is thus higher. Seq4 has larger motion changes and complex conditions, so it has lower accuracy than other sequences. Results show that the RMSE of trajectories is less than 10 m and the angle error is less than 1.5 • . This accuracy meets the requirements of soft landing.

Discussion
In deep-space exploration missions, crater-based optical autonomous navigatio gorithms must be able to detect and identify craters quickly and robustly. The recogn algorithm is related to the output of the detection network, and one must conside detection and recognition algorithm as a whole. Using the object detection networ stead of the segmentation network for crater detection can save the time spent on c edge detection and fitting. According to the bounding box of the object detection netw the KM algorithm and Kalman filter are used to associate the current frame with matched crater data, which can quickly identify the crater.
In terms of algorithm speed, Tables 2 and 4 suggest that on a 1024 × 1024 imag detection speed can reach 9.43 fps and the recognition speed can reach 25.22 fps. In t

Discussion
In deep-space exploration missions, crater-based optical autonomous navigation algorithms must be able to detect and identify craters quickly and robustly. The recognition algorithm is related to the output of the detection network, and one must consider the detection and recognition algorithm as a whole. Using the object detection network instead of the segmentation network for crater detection can save the time spent on crater edge detection and fitting. According to the bounding box of the object detection network, the KM algorithm and Kalman filter are used to associate the current frame with pre-matched crater data, which can quickly identify the crater.
In terms of algorithm speed, Tables 2 and 4 suggest that on a 1024 × 1024 image the detection speed can reach 9.43 fps and the recognition speed can reach 25.22 fps. In terms of algorithm robustness, Figures 2 and 12 suggest that using the bounding box instead of the center point for direct matching is more robust to noise. When the position noise reaches 5% and the angle noise reaches 0.8 • , the matching recognition rate exceeds 90%. Figure 10 suggests that the single-stage target detection network with dense anchor points can deal with the situation of small and dense craters more effectively. Compared with no dense anchor point, the mAP of DPCDN increased by 11.47%.
Moreover, the pose-estimation method in this paper uses the PnP algorithm directly on the results of the crater recognition. How to fuse other sensors, such as inertial measurement units, to achieve higher accuracy and design a long-term stable and real-time pose-estimation algorithm based on craters deserves further attention.

Conclusions
Detection and recognition methods used in existing crater-based pose estimation systems were analyzed in this paper, and a crater detection and recognition method consisting of two stages was designed. In stage 1, the single-stage crater object detection network with dense anchor points can deal with small and dense scenes and achieved state-of-art crater detection performance. In stage 2, the recognition algorithm matches associated craters in the image with previously identified craters and a pre-established craters database using the KM algorithm. The pattern composed of the target crater and the surrounding craters is encoded. The distance of encoding features and the IOU is used as the weight of the KM algorithm. Experimental results show that the F1 score of the detection network is better than those of other detection methods. The performance with dense points improved by 11.47% compared to without dense points. Experiments on images of Mars and the Moon show that the proposed DPCDN can handle a variety of scenarios. Matching based on the bounding boxes is more robust to noise than the direct matching method using the center of the crater. Combining the FFM with FDM, the recognition enables the achievement of real-time speed on the CPU (25FPS). Relying on a high-accuracy network and the sequence image information, we provide an efficient crater detection and recognition method for pose estimation.
Author Contributions: Z.C. is responsible for the research ideas, overall work, the experiments, and the writing of this paper. J.J. provided guidance and modified the paper. All authors have read and agreed to the published version of the manuscript. Chang'e-1 CCD stereo camera 120-m-resolution image