1. Introduction
Humanity’s next step in space exploration involves establishing scientific laboratories and permanent lunar bases on the Moon, supporting deeper space exploration and scientific research efforts [
1,
2,
3]. High-precision 3D lunar surface topography data are essential for the planning and construction of research stations and other exploration activities. Currently, available lunar topographic data from missions like the LRO NAC, Chang’e programs, and recent rover imagery provide valuable high-resolution information. However, these datasets are primarily derived from orbital or rover-based observations, which limit their precision in capturing localized lunar surface features. To achieve centimeter- or millimeter-level precision, in situ stereo vision systems on the lunar surface are necessary. These systems capture high-resolution image pairs and perform disparity estimation through stereo matching, enabling the accurate calculation of disparities between images captured by dual cameras [
4]. By integrating stereo disparities with known camera parameters, these systems can generate highly detailed 3D measurements that are critical for navigation, hazard detection, and the construction of lunar research stations.
However, the unique visual environment of the Moon poses significant challenges to traditional stereo matching techniques. The absence of an atmosphere, combined with the low albedo and retroreflective properties of the lunar regolith, results in extreme brightness contrasts between sunlit and shadowed areas. This creates a highly dynamic range in the captured imagery, with sharp transitions between very bright and very dark regions [
5]. Additionally, the lunar south pole, a key target for missions like Chang’e 7, Chang’e 8, and future scientific station construction [
6], where the sun angle is very low, causes the lunar surface to be frequently obscured by large shadows from rocks and craters. Moreover, the lunar surface is composed of monotonous material types, leading to a scarcity of texture information in lunar imagery. These characteristics substantially increase the difficulty of disparity estimation and 3D reconstruction in lunar scenes.
Another significant challenge is the limited availability and homogeneity of stereo datasets specific to the lunar surface, which restricts the effective validation of stereo matching algorithms. Widely used stereo matching datasets, such as KITTI [
7], Middlebury [
8], and SceneFlow [
9], are primarily based on urban or static everyday scenes, which differ markedly from the lunar environment. Consequently, they are not suitable for validating stereo matching algorithms intended for lunar scenarios. In practice, many studies on stereo matching algorithms for lunar scenes [
10,
11] still rely on these publicly available datasets, which undermines the validity of the results. While some simulated lunar datasets have been developed [
12], they remain largely uniform and lack the diversity necessary to capture the varied conditions of the lunar environment. These datasets generally provide only coarse approximations and do not reflect the specific lighting variations and solar angles characteristic of different lunar regions. This lack of dataset variety and realism restricts the ability to thoroughly validate and benchmark algorithms, limiting their adaptability and effectiveness in real lunar scenarios.
Stereo matching algorithms can be categorized into two main types based on their constraints and search strategies [
13]: local algorithms and global algorithms. Local algorithms compute and aggregate costs using information such as image intensity or gradient within a defined search range [
14,
15,
16]. They then generate the disparity map using the winner-takes-all (WTA) strategy. For example, NASA’s Spirit and Opportunity rovers utilized a correlation-based window stereo matching algorithm [
17]. However, these methods are highly sensitive to variations in image brightness. To address this issue, stereo matching algorithms based on the Sum of Absolute Differences (SAD) and Sum of Squared Differences (SSD) were later developed and widely adopted. For example, the Jet Propulsion Laboratory (JPL) employed an improved SAD algorithm for detecting planetary slopes and rocks [
18]. Meanwhile, reference [
19] used SSD to estimate the DEM of the planetary surface. Despite their widespread use, these two algorithms exhibit limited robustness when dealing with the sparse textures and extreme lighting conditions typical of lunar scenes, resulting in low disparity estimation accuracy. Traditional local window matching algorithms determine the match for the central pixel by comparing the grayscale similarity of neighboring pixels within the window. This approach is prone to errors due to projection distortions that cause grayscale values within the window to misalign, and the pixel-by-pixel calculation is time-consuming. To mitigate these issues, Zalih proposed non-parametric rank and Census algorithms [
20]. These algorithms rely on pixel information but replace absolute information with relative information, reducing the impact of gain and bias and partially addressing the illumination problem. The Tianwen-1 stereo camera employs a sparse Census transform to alleviate grayscale variations caused by uneven lighting [
21]. However, challenges remain in accurately estimating disparities in regions with extensive shadow occlusion and weak textures. Overall, these local methods perform poorly under the specific lighting conditions, weak textures, and shadow occlusions encountered in lunar scenes, making them less effective for stereo matching in complex lunar environments.
Compared with local methods, global methods operate over the entire image domain by constructing an energy function with a data term and a smoothness term and then optimizing this function to obtain a smooth and accurate disparity map. These methods can more effectively address matching ambiguities, particularly in discontinuous and weakly textured areas, with techniques such as graph cuts (GCs) [
22] and Dynamic Programming (DP) [
23]. For example, a study [
24] proposed an adaptive Markov stereo matching model for deep space exploration imagery, which achieved relatively accurate disparity estimation by adaptively determining the disparity range and combining window matching algorithms. However, this method is computationally intensive and inefficient.
Hirschmüller proposed the Semi-Global Matching (SGM) algorithm [
25], which combines the advantages of both local and global methods, achieving a well-balanced trade-off between accuracy and computational complexity. The Planetary Robotics 3D Viewer (PRo3D) utilizes the SGM stereo matching algorithm to process a large volume of stereo images captured by the Mars Exploration Rover (MER) mission [
26]. Another study [
27] built a coarse-to-fine pyramid framework based on the SGM algorithm to improve the accuracy of stereo matching in lunar surface scenes. However, when dealing with textureless regions, SGM is prone to mismatches, leading to inaccurate disparity maps. In SGM, the traditional 2D Markov Random Field (MRF) optimization problem is approximated by a set of 1D linear optimizations along various directions, which enhances efficiency while maintaining a certain level of accuracy. Nevertheless, since each path independently aggregates the matching costs without sharing information, insufficient data can lead to matching voids, resulting in inaccurate final matches. This limitation makes it challenging for SGM to handle the unique lighting conditions and weakly textured regions on the lunar surface. Therefore, it remains crucial to explore high-precision stereo matching methods tailored to the lunar environment.
In this paper, we first utilized the advanced Unreal Engine 4 (UE4) for high-fidelity physical simulation of complex lunar surface scenes. This not only compensates for the lack of lunar surface data but also serves as a new test benchmark for algorithm research and validation. Furthermore, we proposed a stereo matching method for lunar scenes based on an improved Census transform with multi-feature fusion cost calculation and superpixel disparity optimization. In the matching cost calculation part, we designed a new Census transform calculation method and incorporated the image color intensity information into cost. This improvement reduces the effects of extreme lighting and weak textures in lunar images. Additionally, in the superpixel disparity optimization part, we divide the image into superpixels as the basic processing unit for disparity optimization, which improves the accuracy of disparity estimation in the shadow area of the lunar. Finally, we validated our methods using the images of high-fidelity physical simulations of complex lunar scenes. These results show that our approach significantly enhanced the accuracy of disparity estimation and provided more reliable technical support for high-precision 3D reconstruction of the lunar surface. The main contributions of this paper are as follows:
- 1.
In response to the scarcity of diverse lunar scene datasets, we developed a high-fidelity simulation method using the Unreal Engine 4 (UE4) rendering engine to recreate complex lunar environments. This approach not only provides a new benchmark dataset but also offers a replicable method for generating realistic lunar scene simulations to support stereo matching research.
- 2.
We develop an improved Census transform, which mitigates noise interference and enhances robustness against lighting variations. By incorporating image color features into the improved Census transform, we also improve the accuracy of disparity estimation in regions with weak or repetitive textures.
- 3.
We propose a superpixel disparity refinement method, including global disparity optimization and shadow area disparity filling, which effectively improves the disparity estimation accuracy of detail areas and shadow occluded areas.
- 4.
Extensive experiments show that our method significantly improves the accuracy and robustness of parallax estimation in lunar scenes and achieves state-of-the-art (SOTA) results.
The remainder of this paper is organized as follows:
Section 2 introduces the construction of our high-fidelity physical simulation of complex lunar surface scene,
Section 3 presents the proposed algorithm,
Section 4 validates the effectiveness of the proposed method through experiments, and
Section 5 concludes the paper.
4. Experimental Results
4.1. Dataset Description
To evaluate the effectiveness of the proposed method, we conducted comprehensive tests using image data obtained from the high-fidelity complex lunar scene physical simulation described in
Section 2. So far, we collected 20 sets of high-resolution lunar stereo image pairs for experimental evaluation, with imaging distances ranging from 10 to 100 m in the central area and image resolutions of 1920 × 1080 pixels.
Subsequently, we selected 5 pairs of representative lunar stereo images and their corresponding ground truth disparity maps to present the results, which are labeled as Scene 1, Scene 2, Scene 3, Scene 4, and Scene 5. These images simulate the presence of structures and equipment that might exist during the construction of lunar research stations, collectively forming complex lunar scene features. They cover typical lunar environment features, including high-contrast illumination changes, weak texture features, and shadow occlusion areas due to special lighting conditions. The primary objective of these experiments is to evaluate the proposed algorithm’s ability to handle texture variations and disparity estimation across scenes with different levels of detail. This is critical for ensuring the algorithm’s effectiveness in real-world applications, such as lunar surface exploration and the construction of scientific research stations on the Moon.
In addition, to avoid the possibility that the artificial structures we added might introduce extra textures into the images, we created a high-fidelity simulated scene without any artificial structures, more closely resembling the natural lunar environment, and conducted tests accordingly. We selected three pairs of representative image pairs, labeled as Scene 6, Scene 7, and Scene 8.
4.2. Parameter Settings
During the matching cost calculation, the size of the matching window M × N was set to 7 × 7, with an intensity threshold of 50. For the energy function, was set to 0.3, and the disparity segment width L in the data term was set to 2. The parameters and k for the superpixel segmentation were defined as in method 1, where is the Gaussian filter parameter for smoothing the influence of the input, and k affects the size of the segmented superpixels. In the lunar environment dataset, was set to 0.1 and k to 150. Additionally, the subsequent disparity optimization step required no parameters, demonstrating the robustness of this optimization.
4.3. Evaluation Metrics
In our experiments, we employed two standard stereo matching evaluation metrics:
and L_RMS, as defined by Equations (
17) and (
18).
represents the percentage of pixels where the disparity estimation deviates from the ground truth by more than a specified threshold
. We set
to 2.0 and 4.0. L_RMS represents the root mean square error between the estimated disparity and the ground truth, providing a comprehensive measure of the overall accuracy of the disparity estimation.
4.4. Comparative Analysis
To thoroughly validate the effectiveness of the proposed method, we compared it with several other approaches, including SGM [
25] and StereoBM [
13,
18] algorithms which are widely used in the field of planetary exploration, AD-Census [
28], which [
5] has been a prominent stereo matching algorithm, and the deep learning-based method MC-CNN [
31].
The visual results of our simulated image of the lunar research stations are presented in
Figure 8. Panels (b)–(f) show the disparity map obtained by our proposed method and other stereo matching methods, while panel (g) represents the ground truth disparity map. The figure shows that SGM and BM exhibit significant errors and disparity inaccuracies in occluded regions. Although AD-Census does not show prominent invalid pixels, it has substantial errors at object boundaries and exhibits poor smoothness. In contrast, our method demonstrates superior performance across all three representative scenarios, producing visually smoother, more accurate disparity maps that effectively represent lunar surface scenes.
In addition to the aforementioned qualitative visual assessments,
Table 1,
Table 2 and
Table 3 presents a quantitative evaluation of the effectiveness of our proposed method. By comparing the estimated disparity maps with the ground truth disparity maps, we utilized
and L_RMS to assess the quality specifically. The results in
Table 1,
Table 2 and
Table 3 clearly indicate that our method outperforms the other methods across all evaluated metrics.
To further assess the performance of the proposed method, we conducted tests using a high-fidelity simulation scene that closely replicates the natural lunar environment without the inclusion of any artificial structures. The purpose of this test was to evaluate the algorithm in a more realistic setting, where the natural surface features of the Moon—such as sparse textures, high-contrast lighting, and shadowed regions—are predominant, without the influence of additional textures introduced by man-made elements.
The test results are displayed in
Figure 9. As shown, the proposed method demonstrates strong performance in these natural lunar conditions, maintaining accurate disparity estimates even in areas with low texture and significant shadow occlusion.
Table 4,
Table 5 and
Table 6 present the results of our quantitative analysis, which include the
,
, and L_RMS metrics. The quantitative analysis further corroborates the visual observations from
Figure 9. Additionally, the L_RMS value of our method was significantly lower, indicating that our algorithm provides more consistent and accurate disparity estimation compared with the other methods. In conclusion, these tables provide a detailed comparison of our proposed method against traditional stereo matching algorithms, demonstrating its superior performance in handling weak textures, shadow occlusion, and large disparity discontinuities typically encountered in lunar surface environments.
4.5. Ablation Study
To evaluate the effectiveness of our proposed method, we conducted three sets of comparative experiments using images collected from simulated complex lunar scenes. Our process consists of the optimized Census transform multi-feature fusion cost calculation (Opt_Census) and superpixel disparity optimization (SDO), denoted as Opt_Census + SDO. We used the original Census method without any disparity optimization as the baseline for these experiments.
In the first set of comparative experiments, we compared the Opt_Census method with the original Census method while keeping all other components unchanged. The subsequent processing of the matching cost and the primary matching parameters, such as the size of the matching window (7 × 7) and the cost aggregation process, remained consistent with the Census method. This allowed us to specifically assess the improvements introduced by the Opt_Census method. In the second set of experiments, we introduced SDO into the Census method to evaluate the effectiveness of SDO. In the third set of comparative experiments, we combined SDO with the Opt_Census method based on the first set of experiments to evaluate the overall performance of the improved Census method with SDO. The visualization results of these three sets of experiments are shown in
Figure 10.
The results of the experiments are displayed in
Table 7. It is clear that the Opt_Census method performs better than the original Census method, especially in accurately estimating discrepancies within shadow occlusion and weak texture regions. In the second and third sets of experiments, the inclusion of SDO significantly enhances both the Census and Opt_Census methods. Notably, Opt_Census + SDO produces the best results across all metrics, showcasing the superiority of our proposed method for estimating disparities in lunar surface scenarios.
4.6. Test on Real Lunar Dataset
To further evaluate the performance of the proposed method, we conducted tests using real stereo images captured by the panoramic camera (PCAM) onboard China’s Yutu-2 lunar rover. This allows us to validate the algorithm’s robustness on actual lunar surface data. However, it is important to note that these images needed to undergo epipolar rectification before the stereo matching process, as accurate alignment is necessary to ensure high-quality disparity estimation. For the experiment, we randomly selected three representative lunar surface scenes from the Yutu-2 dataset. These scenes closely resemble the environments in our synthetic dataset and include typical lunar surface features such as low-texture regions, areas with high dynamic contrast, and shadowed occlusions. These characteristics are known to be particularly challenging for stereo matching algorithms, which makes this dataset a suitable candidate for assessing the robustness of our method.
Figure 11 presents the left-camera image from the selected Yutu-2 panoramic camera stereo image pair, along with the disparity maps estimated by our proposed method and other stereo matching methods for comparison. Each image pair has a resolution of 1176 × 864 pixels. We used the same parameter settings as described in the previous section to ensure consistency across experiments.
The experimental results indicate the high dynamic contrast of the lunar surface, particularly in regions with both extreme brightness and deep shadows, was well managed, demonstrating the algorithm’s robustness in handling such harsh lighting conditions. The three selected stereo pairs encompass typical lunar surface characteristics, including rocks, occlusions, repeated patterns, and low-texture or even textureless areas. As evident from the experimental results, accurately estimating the disparity in shadow-occluded regions remains challenging. For instance, algorithms such as SGM and StereoBM struggle to provide accurate disparity estimates in these regions, while our method achieves better results. Additionally, in the third stereo pair, which includes large discontinuous disparity regions like the black sky background, methods such as AD Census and MC-CNN produce non-existent values. In contrast, our method effectively addresses these difficult areas, demonstrating superior performance in handling extreme disparity discontinuities. The disparity maps generated from the Yutu-2 images, though lacking ground truth for direct comparison, exhibit visually plausible depth estimates that align with known topographical features of the lunar surface.
5. Conclusions
In this paper, we propose a method using the Unreal Engine 4 (UE4) rendering engine for high-fidelity simulations to recreate complex lunar environments, addressing the scarcity of diverse lunar scene datasets. We focused on overcoming the challenges posed to traditional stereo matching by the high-contrast illumination variations resulting from the absence of atmosphere and low albedo of the lunar surface, as well as the sparse texture information due to the unstructured nature of the lunar terrain. To address these issues, we improved the Census transform cost computation method by optimizing the Census transform and incorporating a multi-cost fusion approach. This enhancement mitigated the impact of the Moon’s unique lighting conditions and improved disparity estimation in areas with weak textures. Additionally, we introduced a multi-layer superpixel disparity optimization method, which combines the initial disparity map obtained from the cost computation with superpixels segmented from the input images. This approach significantly improves the accuracy of disparity estimation in regions with extensive shadow occlusion, which are prevalent due to the low solar angles at the lunar poles.
In the experimental section, we conducted a comprehensive evaluation using image data generated from our high-fidelity physical simulation of complex lunar scenes. The results demonstrate that, compared with several representative classical stereo matching methods, our proposed approach effectively addresses the challenges posed by the unique lighting conditions, weak textures, and shadow occlusion present on the lunar surface. Consequently, our method not only enhances the accuracy of disparity estimation in lunar environments but also paves the way for new directions in stereo vision research for deep space exploration. We hope this work will inspire future research aimed at further optimizing stereo matching technology and expanding its applications in space exploration.