A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint

Chen, Tianhao; Wang, Yexin; Nan, Jing; Zhao, Chenxu; Wang, Biao; Xie, Bin; Liu, Wai-Chung; Di, Kaichang; Liu, Bin; Chen, Shaohua

doi:10.3390/rs17173097

Open AccessArticle

A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint

by

Tianhao Chen

^1,2,

Yexin Wang

^2,*

,

Jing Nan

^2,3

,

Chenxu Zhao

^2,3

,

Biao Wang

^2,3,

Bin Xie

^2,3

,

Wai-Chung Liu

²

,

Kaichang Di

²

,

Bin Liu

²

and

Shaohua Chen

¹

School of Instrumentation Science and Opto-Electronics Engineering, Beijing Information Science & Technology University, Beijing 100192, China

²

State Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China

³

University of Chinese Academy of Sciences, Beijing 100101, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 3097; https://doi.org/10.3390/rs17173097

Submission received: 27 June 2025 / Revised: 27 August 2025 / Accepted: 3 September 2025 / Published: 5 September 2025

(This article belongs to the Special Issue Planetary Geologic Mapping and Remote Sensing (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

Lunar digital elevation models (DEMs) are a fundamental data source for lunar research and exploration. However, high-resolution DEM products for the Moon are only available in some local areas, which makes it difficult to meet the needs of scientific research and missions. To this end, we have previously developed a deep learning-based method (LDEMGAN1.0) for single-image lunar DEM reconstruction. To address issues such as loss of detail in LDEMGAN1.0, this study leverages the inherent structural self-similarity of different DEM data from the same lunar terrain and proposes an improved version, named LDEMGAN2.0. During the training process, the model computes the self-similarity graph (SSG) between the outputs of the LDEMGAN2.0 generator and the ground truth, and incorporates the self-similarity loss (SSL) constraint into the network generator loss to guide DEM reconstruction. This improves the network’s capacity to capture both local and global terrain structures. Using the LROC NAC DTM product (2 m/pixel) as the ground truth, experiments were conducted in the Apollo 11 landing area. The proposed LDEMGAN2.0 achieved mean absolute error (MAE) of 1.49 m, root mean square error (RMSE) of 2.01 m, and structural similarity index measure (SSIM) of 0.86, which is 46.0%, 33.4%, and 11.6% higher than that of LDEMGAN1.0. Both qualitative and quantitative evaluations demonstrate that LDEMGAN2.0 enhances detail recovery and reduces reconstruction artifacts.

Keywords:

lunar exploration; digital elevation model; generative adversarial network; self-similarity loss

1. Introduction

In lunar exploration missions, applications, such as lunar surface feature identification, landing site selection, rover navigation, and positioning, rely on high-precision and high-resolution lunar digital elevation models (DEMs) [1,2,3]. Lunar DEMs are regular grids with corresponding elevation values to represent surface topography. They contribute to both scientific research and the safety of lunar landing missions. Before landing on the surface, DEMs are utilized to support the study of stratigraphy, volcanic activity, and impact history, as well as analyses of slope, illumination condition, communication, flight trajectory, etc., to determine the landing site and ensure safety landing [4,5,6]. After landing, DEMs, including those constructed by orbiter images and those constructed by images captured on the lander/rover, play important roles in post-land operations such as rover navigation and positioning, path planning, in situ analysis support, etc. [7].

In early studies, lunar DEM-generation approaches have been broadly classified into two types: stereo photogrammetry and laser altimetry-based interpolation [8]. Stereo photogrammetry requires stereo images to cover the target area. However, there are currently few high-resolution orbiters designed for direct stereo imaging. It is impossible to obtain sufficient multi-view image data to ensure adequate overlap and achieve global multi-coverage detection [9]. The altimeter is limited by launch plans and orbit coverage density [10]. Due to the differences in the orbit altitude and a large laser spot size, the distribution of the lunar surface elevation points is not uniform, and the spacing between sampling points is large [11]. For example, LOLA’s sampling strategy produces five parallel profiles along LRO’s sub-spacecraft ground track. Profiles are 10–12 m apart, with observations within each profile separated by 56 m [12]. Consequently, the accuracy of the DEM reconstructed through elevation interpolation varies across different regions, making it challenging to achieve high-resolution and high-precision DEMs.

Given the scarcity of high-resolution stereo or multi-coverage images of the lunar surface, recent studies have focused on extracting lunar elevation information from a single image to construct lunar DEMs [13]. Single-image DEM reconstruction methods are primarily based on surface reflectance modeling, such as shape from shading (SFS) and shape and albedo from shading (SAFS) [14,15]. These methods leverage variations in surface brightness to estimate parameters such as relative heights or surface normal vectors at each point, thereby deriving the elevation information and generating terrain data. However, single-image SFS often requires extra constraints to produce reliable results [16], such as using regularization models [17] or incorporating low-resolution DEMs [18], which improve both accuracy and robustness in shape recovery and absolute elevation values.

The advancement of deep learning has transformed the unconstrained single-image 3D reconstruction problem into a nonlinear mapping problem [19]. Neural networks can learn the “hidden” mapping relationships between two types of data and perform end-to-end transformations with sufficient training samples [20]. Deep neural networks extract features from input monocular images and utilize the depth information inherent in these features to generate high-resolution DEMs [21]. These methods mainly include convolutional neural networks (CNNs) and generative adversarial networks (GANs) [22]. CNN-based methods are trained with a large number of images and corresponding depth information to implicitly establish a mapping between images and normalized depth information, enabling terrain reconstruction from a single image [23]. However, these methods rely on high-quality large-scale data and struggle to accurately recover terrain scale information under monocular conditions. To address this challenge, researchers introduced low-resolution DEMs as global constraints, which improved scale-recovery accuracy [24]. Although low-resolution DEMs are used as constraints to improve the overall accuracy and consistency of terrain reconstruction, they still struggle to capture fine-scale terrain features and remain heavily reliant on the quality of low-resolution DEM data. Subsequently, researchers combined low-resolution DEMs and image data as inputs to a CNN, allowing for the network to learn multi-scale terrain features and generate more accurate high-resolution DEMs [25]. With the development of deep neural networks, GANs offer self-supervised adversarial learning frameworks for both image generation and transformation. Later approaches incorporate multi-scale reconstruction strategies alongside single-scale optimization inference mechanisms, effectively addressing the trade-off between reconstruction resolution and global consistency [26,27]. Additionally, combining single images with low-resolution DEMs has been integrated into GAN frameworks [28,29], which improves the accuracy and robustness of single-image DEM reconstruction. However, when significant resolution discrepancies exist between input and output DEMs, existing methods frequently exhibit artifacts such as detail loss and spurious structures in texture-rich areas. This limitation may arise from the generator’s inadequate representational capacity, while discriminators may struggle to effectively identify and constrain subtle differences induced by resolution gaps or inherent model limitations within these texture-rich areas. Thus, mitigating artifact generation (e.g., detail loss and spurious structures) in texture-rich areas constitutes a pivotal challenge for achieving high-fidelity, high-resolution DEM reconstruction.

Self-similarity of terrain originates from the similarity of statistical laws and structural patterns at different DEM data in the same terrain. Introducing self-similarity loss into GANs helps the generator to maintains consistency with the real terrain, effectively improving the authenticity and detail accuracy of the generated images and enhancing the generalization ability of the model. Therefore, based on the principle that the same terrain exhibits self-similarity across different DEM data, this paper improves upon the previously developed single-image lunar DEM reconstruction network (LDEMGAN1.0) [29], which suffers from issues such as detail loss and spurious structures (the features that appear in the DEM reconstructed by LDEMGAN1.0, but are absent in the corresponding real images) in complex terrain structures, and proposes an enhanced version named LDEMGAN2.0. By constructing a self-similarity graph (SSG) and calculating self-similarity loss (SSL) to guide the training process of the DEM reconstruction network, this approach quantifies the correlations between the generated DEM and ground truth, thereby reducing information loss and achieving high-resolution DEM reconstruction. In this study, a lunar reconnaissance orbiter camera (LROC) narrow-angle camera (NAC) digital terrain model (DTM) is used as the ground truth to reconstruct pixel-scale DEMs from LROC NAC imagery of the Apollo 11 landing area. Please note that, for lunar topography, DEM and DTM are interchangeable, as no vegetation or buildings exist. A comprehensive evaluation was conducted by comparing the DEMs generated by LDEMGAN2.0, LDEMGAN1.0, and Ames Stereo Pipeline (ASP) SFS [30] against the ground truth to assess the reconstruction accuracy and generalization performance of the proposed model.

2. Materials and Methods

2.1. Previous Work: LDEMGAN1.0

We previously developed a single-image lunar DEM reconstruction network (LDEMGAN1.0) based on the GAN model [29]. This network uses a single high-resolution digital orthophoto map (DOM) image combined with a low-resolution DEM as input. The low-resolution DEM serves as auxiliary control information, allowing for the network to capture low-frequency terrain features. For this study, the LRO LOLA digital elevation model co-registered with Selene data 2015 (SLDEM2015, ~60 m/pixel) was selected as the low-resolution DEM and up-sampled to the same spatial resolution as the input DOM prior to network input. The reconstruction process is composed of two phases: network training and DEM reconstruction. In the network training phase, high-resolution DOMs are combined in SLDEM2015 with the same spatial resolution to create dual-channel image data. This data is used as input to LDEMGAN1.0, with the corresponding NAC DTM serving as the ground truth for supervised training to obtain the reconstruction network. In the DEM reconstruction phase, high-resolution DOMs and the up-sampled SLDEM2015 data covering the same region are fed into LDEMGAN1.0 to produce DEMs with the same spatial resolution as input images. Since the input data needs to be normalized before entering the generative network for better convergence, the output value of the network fall within [0, 1] as well. Therefore, the output is scaled to convert the generated DEM from relative elevation to absolute value. Although LDEMGAN1.0 has shown promising performance in DEM-reconstruction tasks, it shows certain limitations in representing complex terrain structures and preserving high-frequency details (e.g., impact crater rims, ridges, and textured geological features) in some reconstruction results. These limitations may cause minor structural distortions and partial detail loss in some cases, which can cause local inconsistencies in the geometry and texture of the reconstructed terrain, as shown in Figure 1.

We posit that this may be attributed to the inadequate consideration of the local elevation differences between the reconstructed DEM and the ground truth within the same terrain. Accordingly, in LDEMGAN2.0, we introduced a terrain self-similarity constraint to regularize the network training process.

2.2. Improvement in This Work: LDEMGAN2.0

In order to address the issues identified in the LDEMGAN1.0 network, we propose an improved version, termed LDEMGAN2.0, which leverages the inherent structural self-similarity in different DEM data of the same lunar terrain. Compared to LDEMGAN1.0, LDEMGAN2.0 constructs SSGs between the reconstructed DEM and the ground truth to calculate SSL during training, which in turn guides parameter updates, optimizes the network architecture, and supervises the generation of image structures and details. The training process is illustrated in Figure 2.

During the network training process, the NAC DOM and SLDEM2015, preprocessed and resampled to 2 m/pixel, are used as inputs to LDEMGAN2.0 for DEM reconstruction. The reconstructed DEM is then compared with the ground truth NAC DTM, and SSGs are calculated for the LDEMGAN2.0 generator output and NAC DTM, focusing on the regions with edge and texture information on the ground truth. The resulting SSL is calculated and integrated into the generator’s loss function to guide the optimization of the entire training process.

2.3. DEM Self-Similarity Loss

Based on the principle that the terrain morphology and surface texture in different lunar DEM data exhibit similar statistical distribution patterns and scale invariance at the same lunar terrain, we utilize the principle of self-similarity to calculate SSGs between the reconstruction results of the LDEMGAN2.0 generator and the ground truth during the network training process. SSL is then integrated into the loss function to guide the recovery of the details and structure in the reconstructed DEM.

2.3.1. Self-Similarity Principle

Self-similarity is the statistical property of images, meaning that, for images from the same area, their structural patterns and texture features exhibit similar statistical laws and scale invariance across different data or resolutions [31]. It is widely applied in fields such as image registration, feature matching, and super-resolution reconstruction in computer vision and remote sensing [32]. Figure 3 illustrates the self-similarity of the same lunar region’s different DEM data.

2.3.2. Binary Mask Image

Using the above principle, we calculated the self-similarity between the output DEM of generator and the ground truth DEM to construct the SSGs. However, calculating global SSGs incurs a high training cost. In fact, it is unnecessary to compute the self-similarity for each pixel, as we aimed for LDEMGAN2.0 not only to maintain low training costs, but also to focus on the edge and texture areas to reduce reconstruction artifacts, especially on high-frequent features. Therefore, we reused a binary mask of edge and texture pixels for the ground truth DEM [33], indicating where the SSGs should be calculated during the training process.

The Laplace operator is denoted as L, the ground truth is denoted as DTM_NAC, and the calculated value from the convolution operation is denoted as E, as shown in Formula (1):

E = L \otimes D T M_{NAC}

(1)

The regions where E values exceed the empirical threshold t are considered to have significant edge and texture information are marked as 1. Conversely, the regions where E values are below the threshold t are considered smooth regions and are marked as 0. This forms a binary mask, denoted as Mask, as shown in Equation (2):

Mask = \{\begin{cases} 0, E \leq t \\ 1, E > t \end{cases}

(2)

here, t represents the threshold, which is empirically set to 165 to preserve most of the true edge pixels while filtering out smooth terrain features.

2.3.3. Self-Similarity Graph

Centered on the pixels marked as 1, the search areas for both the LDEMGAN2.0 DEM and the NAC DTM are defined, each having a size of 5 × 5 pixels. A local sliding window is then applied within the search areas. The local sliding windows for LDEMGAN2.0 DEM and NAC DTM are denoted as P_LDEMGAN2.0 and P_NAC, respectively, each with a size of 3 × 3 pixels. As shown in Equation (3), the squared differences between each pixels and its neighboring pixels in P_LDEMGAN2.0 and P_NAC is calculated, respectively, denoted as

d_{LDEMGAN 2.0}^{2} (μ_{i, j})

and

d_{NAC}^{2} (η_{i, j})

.

\begin{array}{l} d_{LDEMGAN 2.0}^{2} (μ_{i, j}) = \frac{1}{c} \sum_{i = 0}^{f} \sum_{j = 0}^{f} {(μ_{i, j} - μ_{i + 1, j + 1})}^{2} \\ d_{NAC}^{2} (η_{i, j}) = \frac{1}{c} \sum_{i = 0}^{f} \sum_{j = 0}^{f} {(η_{i, j} - η_{i + 1, j + 1})}^{2} \end{array}

(3)

here, c represents the number of channels in the image, f represents the size of the block, and

μ_{i + 1, j + 1}

,

η_{i + 1, j + 1}

represents the neighboring pixels of the origin pixel

μ_{i, j}

,

η_{i, j}

in the local sliding window.

Therefore, the self-similarity value of P_LDEMGAN2.0 and P_NAC is denoted as

S (d_{LDEMGAN 2.0}^{2} (μ_{i, j}))

and

S (d_{NAC}^{2} (η_{i, j}))

, as shown in Equation (4):

\begin{array}{l} S (d_{LDEMGAN 2.0}^{2} (μ_{i, j})) = e^{- \frac{d_{LDEMGAN 2.0}^{2} (μ_{i, j})}{h}} \\ S (d_{NAC}^{2} (η_{i, j})) = e^{- \frac{d_{NAC}^{2} (η_{i, j})}{h}} \end{array}

(4)

here, h is the proportional factor, h

>

0.

0 \leq S (d^{2} (μ_{i, j})) \leq 1

, the closer it is to 1, the higher similarity between

μ_{i, j}

,

η_{i, j}

and

μ_{i + 1, j + 1}

,

η_{i + 1, j + 1}

.

In order to enhance the contrast of small values and reduce storage requirements and memory usage, a logarithmic transformation is applied to map the values

S (d_{LDEMGAN 2.0}^{2} (μ_{i, j}))

and

S (d_{NAC}^{2} (η_{i, j}))

to the range of 0–255, denoted as

SSG (S (d^{2} (μ_{i, j})))

, as shown in Equation (5).

SSG (S (d^{2} (μ_{i, j}))) = \frac{\log (1 + S (d^{2} (μ_{i, j})))}{\log (1 + S_{\max} (d^{2} (μ_{i, j})))} \times 255

(5)

where

S_{\max} (d^{2} (μ_{i, j}))

represents the maximum value in self-similarity.

The SSGs of P_LDEMGAN2.0 and P_NAC are generated using Equation (5), denoted as SSG_predict and SSG_NAC, respectively. These graphs illustrate the distribution of inherent structural similarities in the DEMs, as shown in Figure 4.

2.3.4. Loss Function

Based on the principle of self-similarity, this paper leverages the inherent structural self-similarity of different DEM data from the same lunar terrain. By calculating SSL between the LDEMGAN2.0 DEM and the ground truth as a constraint, the network training process is optimized, enhancing the network’s ability to constrain both local and global terrain structures and improving the accuracy of terrain reconstruction.

SSG_predict and SSG_NAC represent the SSGs of the LDEMGAN2.0 DEM and the ground truth, respectively. The differences between them is used to supervise the network training. In this paper, the self-similarity loss is constructed using the Kullback–Leibler (KL) divergence and regularization loss [33], denoted in Equation (6):

L_{SSG} = D_{KL} ({SSG}_{NAC} ∥ {SSG}_{predict}) + ε |{SSG}_{predict} - {SSG}_{NAC}|

(6)

here,

ε

represents the balance parameter factor and D_KL(·) represents the KL divergence, which is used to quantify the distribution difference between SSG_NAC and SSG_predic_t, as shown in Equation (7):

D_{KL} ({SSG}_{NAC} ∥ {SSG}_{predict}) = H ({SSG}_{NAC}, {SSG}_{predict}) - H ({SSG}_{NAC})

(7)

Among them, H(SSG_NAC,SSG_predict) represents the cross-entropy between SSG_NAC and SSG_predict, which quantifies the reconstruction error between SSG_NAC and SSG_predict, and H(SSG_NAC) represents the entropy of SSG_NAC, reflecting the inherent information content in SSG_NAC.

L_SSG is added to the loss function of the LDEMGAN1.0 generator’s loss function L_G [29], with the total loss denoted as L_Gtotal, as shown in Equation (8). The model is then retrained.

L_{Gtotal} = L_{G} + δ L_{SSG}

(8)

where

L_{G} = α L_{Huber} + β L_{grad} + γ L_{GAN} + L_{D}

and

α, β, γ, δ

represent the balanced parameter factors.

2.4. Network Training

The dataset for the network training consists of training samples created from 10 selected regions, as shown with the yellow points in Figure 5. To ensure optimal reconstruction quality, we selected images within our suggested range of 25–70° from past experience, specifically within the range of 45–70°, and avoided those containing shadows with a grayscale value approaching 0 or overexposed regions with a grayscale value approaching 1 (normalized value). The training data consists of LROC NAC DOMs (0.5–2 m/pixel) covering these regions, along with their corresponding LROC NAC DTMs (2 m/pixel) and SLDEM2015 (~60 m/pixel). These data can be downloaded from https://ode.rsl.wustl.edu/moon/ (accessed on 1 May 2025). The information of the LROC NAC DOM image used is shown in Table 1.

In this paper, the training data are resampled to a resolution of 2 m/pixel for subsequent comparative analysis under the same experimental conditions. The resampled LROC NAC DOM and its corresponding SLDEM2015 for the same area are served as the first and second bands, respectively, to synthesize dual-channel image data. The LROC NAC DTM served as the ground truth. To enhance the generalization ability of the network, data augmentation techniques are applied, including training samples with different lunar geomorphologies (e.g., impact craters, volcanos, ridges, and so on) and training samples after rotation or flipping. As a result, a total of 51,222 sets of 256 × 256 pixels dual-band input data and corresponding high-resolution DEMs are generated and divided into a training set and a validation set in an 8:2 ratio for the model training and performance evaluation.

For the reconstruction network training, the experimental hyperparameters are set as follows: block size of 256 × 256 pixels, batch size of 64, and 300 training epochs. The initial learning rate is set to 2 × 10⁻⁴, which begins to decay after 100 epochs, and is linearly reduced to 10⁻⁶. The coefficients in Equation (4) is set to h = 0.004, which effectively captures the fine structures in the image while also avoiding the excessive amplification of noise. The coefficients in Equation (8) are set to

α

= 0.5,

β

= 0.5,

γ

= 0.8, and

δ

= 0.5. The GPU used in the experimental hardware environment is the NVIDIA GeForce RTX 4090, with a machine RAM of 64 GB. The experiments are implemented in Python 3.10 and PyTorch 2.3.1 + cu121 deep learning framework, with CUDA 12.1 used for acceleration. The total training time was 36 h. To ensure the comparability of the experimental results, all experiments in this paper are conducted in the same environment and with identical parameter settings.

3. Results

3.1. Experimental Area

To evaluate the reconstruction performance of LDEMGAN2.0, the Apollo 11 landing area, located around 0.77°N, 23.44°E, is selected as the experimental area, which differs from training data area. The location is indicated by the red dot in Figure 5. The area is mostly flat with minor terrain undulations, and is primarily composed of lunar mare basalts, predominantly consisting of low-lying basalt plains. Small impact craters are scattered locally, with their diameters ranging from several to hundreds of meters [34].

3.2. Evaluation Indicators

To evaluate the reconstruction performance of the improved algorithm model, this paper uses the mean absolute error (MAE) and root mean square error (RMSE) as statistical evaluation metrics to quantitatively assess the elevation accuracy deviation, overall error distribution, and local error variations.

MAE is defined in statistics as the average of the absolute differences between the predicted values and the ground truth. It is used to measure the network’s average prediction error, as shown in Equation (9):

MAE (y_{i}, {\hat{y}}_{i}) = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(9)

where n represents the number of samples,

y_{i}

represents the ground truth, and

{\hat{y}}_{i}

represents the network’s predicted value. Please note that, as there is no absolute elevation value, the LROC NAC DTM is considered the ground truth.

RMSE is a commonly used metric in statistics and regression analysis to measure the average deviation between the network’s predicted values and the ground truth, as shown in Equation (10):

RMSE (y_{i}, {\hat{y}}_{i}) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(10)

Additionally, to ensure that the DEM evaluation not only focuses on numerical accuracy deviations, but also reflects the overall stability and spatial consistency of the terrain structure, the structural similarity index measure (SSIM) was used to assess the similarity between the DEM reconstructed by the improved network reconstructed DEM and the NAC DTM in terms of spatial feature distribution, terrain gradient changes, and texture details [35], as shown in Equation (11):

SSIM (y_{i}, {\hat{y}}_{i}) = \frac{(2 μ_{{\hat{y}}_{i}} μ_{y_{i}} + C_{1}) (2 σ_{{\hat{y}}_{i} y_{i}} + C_{2})}{(μ_{{\hat{y}}_{i}}^{2} + μ_{y_{i}}^{2} + C_{1}) (σ_{{\hat{y}}_{i}}^{2} + σ_{y_{i}}^{2} + C 2)}

(11)

where

μ_{{\hat{y}}_{i}}

and

μ_{y_{i}}

represents the mean of

{\hat{y}}_{i}

and

y_{i}

respectively,

σ_{{\hat{y}}_{i}}^{2}, σ_{y_{i}}^{2}

represents the variance in

{\hat{y}}_{i}

and

y_{i}

respectively,

σ_{{{\hat{y}}_{i} y}_{i}}

represents the covariance of

{\hat{y}}_{i}

and

y_{i}

, and

C_{1} {, C}_{2}

are constants.

The elevation differences between each reconstructed DEM (LDEMGAN2.0, LDEMGAN1.0, ASP SFS [30]) and the ground truth DEM are calculated individually to generate absolute difference maps for error analysis. Additionally, a hillshade map is created to simulate the terrain illumination variations under the same solar altitude angle of the input images. Finally, elevation profiles are used to analyze and compare the morphological characteristics.

3.3. Reconstruction and Analysis

3.3.1. Analysis and Evaluation of Overall Reconstruction Results

In this study, the images used for testing the network performance were M150368601LE and M150368601RE, which include both left and right experimental data record (RE) images, with a solar elevation angle of 46.26° and an image resolution of 0.53 m/pixel. Since there is a significant resolution difference between the NAC DOM and SLDEM2015, an iterative reconstruction procedure was adopted. First, the input data consists of the LROC NAC DOM was down-sampled to 10 m/pixel, and the SLDEM2015 was up-sampled to the same resolution of 10 m/pixel. They were fed into the network to generate a reconstructed DEM at 10 m/pixel resolution. Subsequently, the generated 10 m/pixel DEM was up-sampled to 2 m/pixel and combined with the LROC NAC DOM, which was also down-sampled to 2 m/pixel, to generate a final DEM at 2 m/pixel resolution. Finally, the mean and standard deviation of the SLDEM2015 were calculated, based on which the reconstructed elevation was stretched to obtain the absolute height. Geographic information was subsequently added to the restored height, and scale restoration was completed. The LROC NAC DTM (2 m/pixel) was used as the ground truth to reconstruct the DEM of the Apollo 11 landing area. Please note that by configuring the appropriate hierarchical structures, LDEMGAN2.0 can generate DEMs at the same grid resolution as the input data. The down-sampling to 2 m/pixel was primarily used to facilitate comparative analysis with the ground truth.

Meanwhile, the key parameters of the ASP SFS DEM reconstruction method were set as follows: the shadow threshold was set to 0.003, and the smoothness weight was set to 0.04 to achieve a balance between detail recovery and terrain continuity. The reflectance model chosen was the Lunar-Lambertian model (reflectance type 1) [36], which is suitable for the lunar surface. The maximum number of iterations was set to 20 to ensure full convergence. To improve the image fusion effect, the blending distance was set to 10 pixels, and the minimum blending area width was set to 100 pixels [30].

The elevation accuracy of the reconstruction results was evaluated using the metrics described above. The reconstruction results of LDEMGAN2.0 in the Apollo 11 experimental area are shown in Figure 6.

The absolute difference maps for the LDEMGAN2.0-reconstructed DEM, LDEMGAN1.0-reconstructed DEM, and ASP SFS DEM, with respect to the NAC DTM, were calculated, as shown in Figure 7.

As shown in Figure 7, the LDEMGAN2.0 network has enhanced its ability to recover terrain details. Approximately 55% of the pixels have an elevation difference of less than 2 m, and about 95% of the pixels have an elevation difference of less than 10 m. The overall accuracy of the DEM has improved. The statistical results of detailed elevation residuals with the ground truth for the reconstruction results of the Apollo 11 experimental area using LDEMGAN 2.0, LDEMGAN 1.0, and ASP SFS are shown in Figure 8. From the figure, it is obvious that our proposed LDEMGAN 2.0 has the lowest errors.

Statistics of the overall reconstruction error in the experimental area of the three methods are demonstrated in Table 2. As shown in Table 2, using the LROC NAC DTM (2 m/pixel) as the ground truth, the MAE for the entire experimental area is 1.491 m, the RMSE is 2.015 m, and the SSIM is 0.86, which is 46.0%, 33.4%, 11.6% higher than that of LDEMGAN1.0, and 29.3%, 45.8%, 17.8% higher than that of ASP SFS, respectively.

In terms of reconstruction speed, LDEMGAN2.0 incorporates self-similarity loss during training without affecting the generator’s structure, allowing for its reconstruction speed to remain consistent with that of LDEMGAN1.0. It takes 3 min to reconstruct the 4 km × 27 km DEM of the Apollo 11 experimental area. In contrast, ASP SFS requires multiple iterations over each pixel, of which the computational workload grows exponentially as the resolution increases. Nonetheless, LDEMGAN2.0 is a pre-trained model which does not need optimization case by case, while ASP SFS is akin to neural radiance field (NeRF) and Gaussian Splatting, which require substantial training for each mapping task. Under the same experimental conditions, to ensure that the SFS reconstruction results are comparable to other methods in terms of spatial resolution and detail representation, enabling subsequent quantitative analysis and comparison, we used the key parameters mentioned in Section 3.3.1 in ASP SFS. As a result, the 2 m/pixel DEM of the experimental area was reconstructed, taking approximately 12.5 h in total.

3.3.2. Detailed Analysis of the Reconstruction Results

To further analyze the reconstruction results, this study selected four 1 km × 1 km sub-regions (labeled I–IV in Figure 9) for further examination. For comparison with the ground truth, the image resolution was down-sampled to 2 m/pixel and overlaid on the SLDEM2015, as shown in Figure 9.

Figure 10 shows hillshade maps of the LDEMGAN2.0 DEM, LDEMGAN1.0 DEM, ASP SFS DEM, and NAC DTM for the zoomed-in sub-region I–IV shown in Figure 9. As shown in Figure 10, the elevation information in the LDEMGAN2.0 DEM is finer than that in LDEMGAN1.0, with significant improvements in detail recovery and terrain consistency. The elevation changes in sub-region I are relatively minor, and the DEMs reconstructed by all three methods exhibited higher consistency with the ground truth. For the small impact craters with diameters of approximately 160 m and 55 m located in the lower-right corner of the image in sub-region II and III, as indicated by the red arrows, the DEM reconstructed by ASP SFS exhibits irregular shapes (e.g., deformed impact crater rims). For such small-scale illumination-sensitive terrain, it is prone to edge perturbations caused by noise and to distortions of crater-floor morphology induced by photometric artifacts, thereby resulting in irregular shapes. In contrast, LDEMGAN2.0 calculates SSL by leveraging the self-similarity of terrain in the same area at different DEM data, thereby constraining the training process of the DEM reconstruction network. This effectively suppresses detail loss while preserving local detail features. In sub-region II and IV, the dashed box highlights that LDEMGAN1.0 DEM displays more terrain details, such as spurious structures with diameters of approximately 50 m. However, upon comparison with the original NAC DOM, it was determined that these fine terrain features are not actual surface features. By integrating structural self-similarity loss during training, LDEMGAN2.0 effectively suppressed these unrealistic reconstructions, improving the model’s ability to represent the actual terrain structure and maintain consistency. Compared to LDEMGAN1.0, LDEMGAN2.0 effectively reduces structural distortions and detail loss in some cases, enhances the terrain consistency (i.e., LDEMGAN2.0 DEM shows closer agreement with the NAC DTM in overall terrain morphology, including topographic trends and structural details), better preserves geomorphic features. It even overcomes the stripe artifacts in sub-area IV in the NAC DTM.

Based on the reconstruction results of LDEMGAN2.0, the reconstruction error statistics for the four selected sub-regions (labeled I–IV in Figure 9) are presented in Table 3. In terms of elevation change, sub-region I shows the smallest elevation change at 42.76 m. Sub-regions II and III exhibit relatively large terrain fluctuations, with values of 76.00 m and 69.10 m. Sub-region IV shows the elevation change at 62.01 m. Regarding accuracy, sub-region I achieves RMSE of 2.78 m and SSIM of 0.79. Sub-region II, which exhibits the largest elevation change, achieves RMSE of 2.23 m and SSIM of 0.85, suggesting that the model demonstrates strong robustness and generalization capability in these areas. Sub-region III performs well across various evaluation metrics, with a MAE of 1.25 m, RMSE of 2.06 m, and SSIM of 0.84, indicating that the reconstruction results in this region are highly consistent with the ground truth in terms of both overall structure and local details. In addition, sub-region IV shows RMSE of 2.74 m, a maximum error of 7.75 m, and SSIM of 0.77.

In addition, we conducted a detailed profile analysis of the four selected 1 km × 1 km sub-regions (labeled I–IV in Figure 9). For each area, we measured three DEM profiles to show the terrain differences and provide a detailed comparison among the LDEMGAN2.0 DEM, LDEMGAN1.0 DEM, ASP SFS DEM, and NAC DTM, as shown in Figure 11.

Figure 11 shows that the DEM reconstructed using LDEMGAN2.0 preserves the terrain undulation characteristics compared to the ground truth. The maximum elevation error between the LDEMGAN2.0 DEM and the ground truth (NAC DTM) are 7.01 m, 4.72 m, 5.24 m, and 7.75 m in sub-regions I, II, III, and IV, respectively. These results indicate that LDEMGAN2.0 maintains a high degree of consistency with the ground truth in terms of overall trends, and shows better detail preservation ability. The elevation profiles in sub-region I of Figure 11a and region IV of Figure 11d further verify the conclusion of Figure 10, which indicates that LDEMGAN2.0 effectively reduces detail loss and spurious structures, enhancing the terrain authenticity of the DEM. This is attributed to the lack of LDEMGAN1.0 consideration for the self-similarity between the reconstructed DEM and the ground truth at the same terrain, resulting in weak constraints on local features in areas with strong texture information. In contrast, most of the fine terrain features reconstructed by LDEMGAN2.0 align with the ground truth, which demonstrates that LDEMGAN2.0 improves stability and consistency in terrain detail recovery compared to LDEMGAN1.0.

In summary, based on the analysis of the reconstruction results of the four sub-regions, the LDEMGAN2.0 network has improved its ability to restore terrain details after incorporating self-similarity loss, particularly in areas with significant changes in crater shape, slope, and elevation gradient. The reconstruction results align more closely with the NAC DTM in both overall trends and local fluctuations, demonstrating an improvement in feature structure and terrain consistency. This makes LDEMGAN2.0 more suitable for high-precision DEM reconstruction tasks.

4. Discussion

This study conducts experiments and analyzes the method of reconstructing high-resolution DEMs based on the self-similarity of different DEM data for the same lunar terrain. According to the findings and challenges encountered in this research, the following points should be further explored and improved in future work:

(1) In our previous work [29], as reported in our experiments, when the solar altitude angle is within approximately 25–70°, the reconstruction accuracy remains stable and the errors are within an acceptable range. When the solar altitude angle is not within this range, factors such as the underexposure or overexposure of the image may lead to additional reconstruction errors. Based on this experience, we suggest employing imagery acquired within the recommended range and excluding imagery acquired under extreme illumination conditions or undergoing image preprocessing before use so as to ensure stable network convergence and maintain reconstruction accuracy.

(2) In the current high-resolution DEM-reconstruction task, a single data source is often limited by factors such as acquisition conditions, observation angles, and spatial resolution, making it difficult to fully capture the complexity of terrain structures. Thus, optimizing multi-source data-fusion strategies is a crucial direction for improving system accuracy and robustness. Moreover, attention should be paid to the spatial registration and error-propagation mechanisms between data sources to ensure geometric consistency and elevation accuracy during fusion process. This will provide a solid foundation for high-quality DEM reconstruction.

(3) The approach proposed in this study can be applied to other places of the Moon (e.g., south pole) [37,38] and also to the surfaces of other planets through transfer learning with a relatively small amount of additional data. For the Moon’s south pole, as the illumination conditions are much worse compared to those in the data used in this study, preprocessing methods such as image enhancement should be considered. Moreover, images captured from different times and sources (e.g., ShadowCam onboard the Korea Pathfinder Lunar Orbiter) should be employed. For other planetary applications—take Mars, for example—as there are some geomorphologies that do not exist on the Moon (e.g., aeolian features), data containing them should be added.

5. Conclusions

This paper proposes a DEM construction network LDEMGAN2.0, the improved version of version LDEMGAN1.0, by utilizing the self-similarity of different DEM data in the same lunar terrain. During the training process, the model incorporates the SSL constraint into the network generator loss to guide DEM reconstruction so as to solve issues such as detail loss and spurious structures in complex terrain structures in the previous network. To evaluate the performance of LDEMGAN2.0, we conducted experiments in the Apollo 11 landing area. The reconstructed DEM was compared with LDEMGAN1.0, ASP SFS DEM, and NAC DTM using both qualitative analysis and quantitative metrics, including MAE, RMSE, and SSIM.

The experimental results show that, compared to LDEMGAN1.0, ASP SFS, and NAC DTM, the DEM reconstructed by LDEMGAN2.0 achieves the best metrics: MAE for the entire experimental area is 1.49 m, RMSE is 2.01 m, and SSIM is 0.86, which is 46.0%, 33.4%, 11.6% higher than that of LDEMGAN1.0, and 29.3%, 45.8%, 17.8% higher than that of ASP SFS, respectively. These results validate the feasibility and effectiveness of the improved reconstruction network. Moreover, it is shown that adding SSL to the DEM reconstruction network during training significantly enhances the network’s ability to recover local details. The mechanism enables the network to learn the similarity between SSGs in the local scale, thereby further strengthening the constraints on detail and structural consistency in DEM reconstruction and helping to reduce the generation of detail loss and the appearance of spurious structures during the reconstruction process.

This method is valuable for scientific and engineering applications such as terrain feature analysis, landing site selection, navigation and localization, path planning, and decision support.

Author Contributions

Conceptualization, Y.W. and T.C.; methodology, T.C. and Y.W.; software, T.C. and Y.W.; validation, Y.W., W.-C.L., B.L. and K.D.; formal analysis, Y.W. and K.D.; investigation, K.D., Y.W. and S.C.; resources, C.Z., B.W. and B.X.; data curation, T.C. and Y.W.; writing—original draft preparation, T.C.; writing—review and editing, Y.W., J.N., W.-C.L. and K.D.; visualization, T.C. and Y.W.; supervision, K.D., Y.W. and S.C.; project administration, T.C. and Y.W.; funding acquisition, K.D. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant No. 2022YFF0503100), International Partnership Program of Chinese Academy of Sciences (for Grand Challenges, No. 313GJHZ2024034GC), and Science and Disruptive Technology Program, AIRCAS (Grant No. 2024-AIRCAS-SDTP-13).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors thank all those who worked on the NASA Planetary Data System archive to make the LROC imagery and SLDEM2015 available.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ASP	Ames Stereo Pipeline
CNN	Convolutional neural network
DEM	Digital elevation model
DTM	Digital terrain model
DOM	Digital orthophoto map
GAN	Generative adversarial network
KL	Kullback–Leibler
LE	Left-experimental data record
LOLA	Lunar Orbiter Laser Altimeter
LROC	Lunar Reconnaissance Orbiter camera
MAE	Mean absolute error
NAC	Narrow angle camera
NeRF	Neural radiance field
RE	Right-experimental data record
RMSE	Root mean squared error
SFS	Shape from shading
SAFS	Shape and albedo from shading
SSL	Self-similarity loss
SSG	Self-similarity graph
SLDEM 2015	LRO LOLA Digital Elevation Model Co-registered with Selene Data 2015
SSIM	Structural similarity index measure

References

Liu, J.J.; Zeng, X.G.; Li, C.L.; Ren, X.; Yan, W.; Tan, X.; Zhang, X.X.; Chen, W.L.; Zuo, W.; Liu, Y.X.; et al. Landing Site Selection and Overview of China’s Lunar Landing Missions. Space Sci. Rev. 2021, 217, 6–30. [Google Scholar] [CrossRef]
Di, K.C.; Liu, B.; Liu, Z.Q.; Zou, Y.L. Review and prospect of lunar mapping using remote sensing data. J. Remote Sens. 2016, 20, 1230–1242. [Google Scholar]
Formisano, M.; De Sanctis, M.C.; Boazman, S.; Frigeri, A.; Heather, D.; Magni, G.; Teodori, M.; De Angelis, S.; Ferrari, M. Thermal modeling of the lunar South Pole: Application to the PROSPECT landing site. Planet. Space Sci. 2024, 251, 105969. [Google Scholar] [CrossRef]
Fassett, C.I.; Head, J.W.; Kadish, S.J.; Mazarico, E.; Neumann, G.A.; Smith, D.E.; Zuber, M.T. Lunar impact basins: Stratigraphy, sequence and ages from superposed impact crater populations measured from Lunar Orbiter Laser Altimeter (LOLA) data. J. Geophys. Res. Planets 2012, 117, E00H06. [Google Scholar] [CrossRef]
Wu, B.; Li, F.; Ye, L.; Qiao, S.; Huang, J.; Wu, X.Y.; Zhang, H. Topographic mapping and analysis at the landing site of Chang’E-3 on the moon. Earth Planet. Sci. Lett. 2014, 405, 257–273. [Google Scholar] [CrossRef]
Wu, B.; Li, F.; Hu, H.; Zhao, Y.; Wang, Y.R.; Xiao, P.P.; Li, Y.; Liu, W.C.; Chen, L.; Ge, X.M.; et al. Topographic and geomorphological mapping and analysis of the Chang’E-4 landing site on the far side of the Moon. Photogramm. Eng. Remote Sens. 2020, 86, 247–258. [Google Scholar] [CrossRef]
Di, K.C.; Liu, Z.Q.; Wan, W.H.; Peng, M.; Liu, B.; Wang, Y.X.; Gou, S.; Yue, Z.Y. Geospatial technologies for Chang’e-3 and Chang’e-4 lunar rover missions. Geo-Spat. Inf. Sci. 2020, 23, 87–97. [Google Scholar] [CrossRef]
Wu, B.; Guo, J.; Zhang, Y.; King, B.A.; Li, Z.; Chen, Y. Integration of Chang’E-1 Imagery and Laser Altimeter Data for Precision Lunar Topographic Modeling. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4889–4903. [Google Scholar]
Liu, C.B.; Bo, Z.; Zhang, P.; Zhou, M.Y.; Liu, W.Y.; Huang, R.; Niu, R.; Ye, Z.; Yang, H.Z.; Liu, S.J.; et al. Current status and future prospects of lunar topographic remote sensing and mapping. J. Geo-Inf. Sci. 2025, 27, 801–819. [Google Scholar]
Shan, J.; Tian, X.; Li, S.; Li, R.F. Advances of spaceborne laser altimetry technology. Acta Geod. Cartogr. Sin. 2022, 51, 964–982. [Google Scholar]
Wu, B.; Hu, H.; Guo, J. Integration of Chang’E-2 imagery and LRO laser altimeter data with a combined block adjustment for precision lunar topographic modeling. Earth Planet. Sci. Lett. 2014, 391, 1–15. [Google Scholar] [CrossRef]
Smith, D.E.; Zuber, M.T.; Neumann, G.A.; Lemoine, F.G.; Mazarico, E.; Torrence, M.H.; McGarry, J.F.; Rowlands, D.D.; Head, J.W., III; Duxbury, T.H.; et al. Initial observations from the lunar orbiter laser altimeter (LOLA). Geophys. Res. Lett. 2010, 37, L18204. [Google Scholar] [CrossRef]
Wu, B.; Liu, W.C.; Grumpe, A.; Wöhler, C. Shape and Albedo from Shading (SAFS) for pixel-level DEM generation from monocular images constrained by low-resolution DEM. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 521–527. [Google Scholar] [CrossRef]
Liu, W.C.; Wu, B.; Wöhler, C. Effects of illumination differences on photometric stereo shape-and-albedo-from-shading for precision lunar surface reconstruction. ISPRS J. Photogramm. Remote Sens. 2018, 136, 58–72. [Google Scholar]
Horn, B.K.P. Shape from Shading: A Method for Obtaining the Shape of a Smooth Opaque Object from One View. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1970. [Google Scholar]
Ju, Y.K.; Zhou, J.C.; Zhou, S.T.; Xie, H.; Zhang, C.; Xiao, J.; Yang, C.X.; Sun, J. Three-dimentional reconstruction of underwater side-scan sonar images based on shape-from-shading and monocular depth fusion. Intell. Mar. Technol. Syst. 2024, 2, 4. [Google Scholar] [CrossRef]
Santo, H.; Samejima, M.; Matsushita, Y. Numerical shape-from-shading revisited. IPSJ Trans. Comput. Vis. Appl. 2018, 10, 8. [Google Scholar] [CrossRef]
Grumpe, A.; Belkhir, F.; Wöhler, C. Construction of lunar DEMs based on reflectance modelling. Adv. Space Res. 2014, 53, 1735–1767. [Google Scholar] [CrossRef]
Buyukdemircioglu, M.; Kocaman, S.; Kada, M. Deep Learning for 3D Building Reconstruction: A Review. In Proceedings of the 24th ISPRS Congress on Imaging Today, Foreseeing Tomorrow, Nice, France, 6–11 June 2022. [Google Scholar]
Masoumian, A.; Rashwan, H.A.; Cristiano, J.; Asif, M.S.; Puig, D. Monocular Depth Estimation Using Deep Learning: A Review. Sensors 2022, 22, 5353. [Google Scholar] [CrossRef] [PubMed]
Cao, J.R.; Huang, R.; Ye, Z.; Xu, Y.S.; Tong, X.H. Generative 3D Reconstruction of Martian Surfaces using Monocular Images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 51–56. [Google Scholar] [CrossRef]
Ge, L. Research on 3D Reconstruction Methods Based on Deep Learning. Trans. Comput. Sci. Intell. Syst. Res. 2024, 5, 678–684. [Google Scholar] [CrossRef]
Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Chen, Z.; Wu, B.; Liu, W.C. Mars3DNet: CNN-Based High-Resolution 3D Reconstruction of the Martian Surface from Single Images. Remote Sens. 2021, 13, 839. [Google Scholar] [CrossRef]
Chen, H.; Hu, X.Y.; Gläser, P.; Xiao, H.F.; Ye, Z.; Zhang, H.Y. CNN-Based Large Area Pixel-Resolution Topography Retrieval From Single-View LROC NAC Images Constrained With SLDEM. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 9398–9416. [Google Scholar] [CrossRef]
Tao, Y.; Xiong, S.T.; Conway, S.J.; Muller, J.P.; Guimpier, A.; Fawdon, P.; Thomas, N.; Cremonese, G. Rapid Single Image-Based DTM Estimation from ExoMars TGO CaSSIS Images Using Generative Adversarial U-Nets. Remote Sens. 2021, 13, 2877. [Google Scholar] [CrossRef]
Tao, Y.; Muller, J.; Xiong, S.; Conway, S. MADNet 2.0: Pixel-Scale Topography Retrieval from Single-View Orbital Imagery of Mars Using Deep Learning. Remote Sens. 2021, 13, 4220. [Google Scholar] [CrossRef]
Grassa, R.L.; Gallo, I.; Re, C.; Cremonese, G.; Landro, N.; Pernechele, C.; Simioni, E.; Gatti, M. An Adversarial Generative Network Designed for High-Resolution Monocular Depth Estimation from 2D HiRISE Images of Mars. Remote Sens. 2022, 14, 4619. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.X.; Di, K.C.; Peng, M.; Wan, W.H.; Liu, Z.Q. A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from High-Resolution Monocular Imagery and Low-Resolution DEM. Remote Sens. 2022, 14, 5420. [Google Scholar] [CrossRef]
Alexandrov, O.; Beyer, R.A. Multiview Shape-From-Shading for Planetary Images. Earth Space Sci. 2018, 5, 652–666. [Google Scholar] [CrossRef]
Bodensteiner, C.; Huebner, W.; Juengling, K.; Mueller, J.; Arens, M. Local multi-modal image matching based on self-similarity. In Proceedings of the IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010. [Google Scholar]
Chang, K.; Ding, P.; Li, B.X. Single image super-resolution using collaborative representation and non-local self-similarity. Signal Process. 2018, 149, 49–61. [Google Scholar] [CrossRef]
Chen, D.; Zhang, Z.Q.; Liang, J.; Zhang, L. SSL: A Self-similarity Loss for Improving Generative Image Super-resolution. In Proceedings of the 32nd ACM International Conference on Multimedia (MM ’24), New York, NY, USA, 28 October–1 November 2024. [Google Scholar]
Iqbal, W.; Hiesinger, H.; van der Bogert, C. Geological mapping and chronology of lunar landing sites: Apollo 11. Icarus 2019, 333, 528–547. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
McEwen, A.S. Photometric functions for photoclinometry and other applications. Icarus 1991, 92, 298–311. [Google Scholar] [CrossRef]
Kereszturi, A. Polar Ice on the Moon. In Encyclopedia of Lunar Science; Cudnik, B., Ed.; Springer: Cham, Switzerland, 2023; pp. 971–980. [Google Scholar]
Long, X.X.; Wang, C.; Sanlang, S.J.; Feng, Y.J.; Xu, X.; Xie, H.; Tong, X.H. Retrieving water ice abundance in representative regions of the Moon’s South Pole using topography corrected SELENE MI images. Geosci. Lett. 2025, 12, 33. [Google Scholar] [CrossRef]

Figure 1. LDEMGAN1.0 exhibits minor spurious structures and a degree of detail loss in some cases. The dashed box highlights LDEMGAN1.0 local spurious structures, while red arrows point to its areas with detail loss.

Figure 2. LDEMGAN2.0 training process.

Figure 3. Schematic diagram of the self-similarity of the lunar surface appearing in DEMs with different data and resolutions. (a) NAC DTM (2 m/pixel), (b) NAC DTM (60 m/pixel), (c) SLDEM2015 (2 m/pixel), (d) SLDEM2015 (60 m/pixel). The impact craters (highlighted with red squares) exhibit inherent structural self-similarity across different data and resolutions.

Figure 4. Illustration of the SSGs calculation process. (a) A mask is constructed based on the ground truth to indicate regions where the SSG should be calculated during the training process. (b) For pixels marked as 1, a 5 × 5 pixel calculation area is defined, and a 3 × 3 pixel sliding window is used to calculate the SSG within the area.

Figure 5. Distribution of data for training and experiment (Projection: Orthographic; False Easting: 0.0°; False Northing: 0.0°; Longitude of center: 0.0°; Latitude of center: 0.0°). The yellow points represent the training regions, while the red point represent the experimental area.

Figure 6. Reconstruction results of the Apollo 11 experimental area rendered with NAC DOM are shown in 3D-perspective view, with two zoomed-in views. (Height exaggeration factor: 10). The scalebar corresponds to the entire Apollo 11 experimental area.

Figure 7. Comparison of absolute difference maps overlaid on the sldem2015: (a) absolute difference map between the DEM reconstructed by LDEMGAN2.0 and the NAC DTM; (b) absolute difference map between the DEM reconstructed by LDEMGAN1.0 and the NAC DTM; (c) absolute difference map between the DEM reconstructed by ASP SFS and the NAC DTM; (d) NAC DTM (2 m/pixel).

Figure 8. Box-plot of elevation residuals. Statistical results of detailed elevation residuals with the ground truth for the reconstruction results of the Apollo 11 experimental area using LDEMGAN 2.0, LDEMGAN 1.0, and ASP SFS.

Figure 9. LROC NAC DOM of the experimental area overlaid on SLDEM2015. Four sub-regions are marked in red rectangles and labeled for further analysis.

Figure 10. Local terrain reconstruction results of sub-region I–IV. The dashed box highlights local false textures and structural artifacts, while red arrows point to areas with obvious reconstruction errors or seams.

Figure 11. (a) Profiles of the 2 m/pixel LDEMGAN2.0 DEM, LDEMGAN DEM, ASP SFS DEM, and NAC DTM in sub-region I (location shown in Figure 9). (b) Profiles of the 2 m/pixel LDEMGAN2.0 DEM, LDEMGAN DEM, ASP SFS DEM, and NAC DTM in sub-region II (location shown in Figure 9). (c) Profiles of the 2 m/pixel LDEMGAN2.0 DEM, LDEMGAN DEM, ASP SFS DEM, and NAC DTM in sub-region III (location shown in Figure 9). (d) Profiles of the 2 m/pixel LDEMGAN2.0 DEM, LDEMGAN DEM, ASP SFS DEM, and NAC DTM in sub-region IV (location shown in Figure 9).

Table 1. Information of the training data.

ID	Area	LROC NAC Image ID
1	Apollo 12	M120005333LE, M120005333RE
2	Apollo 13	M160139273LE, M160139273RE
3	Apollo 14	M150633128LE, M150633128RE
4	Apollo 16	M159847331LE, M159847331RE
5	Apollo 17	M144708115LE, M144708115RE
6	LUNA 16	M159582808LE, M159582808RE
7	LUNA 24	M144219225LE, M144219225RE
8	RANGER 6	M144483945LE, M144483945RE
9	RANGER 9	M12930938LE, M12930938RE
10	RIMASHARP 3	M173252954LE, M173252954RE

Table 2. Statistics of the overall reconstruction error in the experimental area.

Method	Reconstruction Error <2 m (%)	Reconstruction Error <4 m (%)	Reconstruction Error <10 m (%)	MAE (m)	RMSE (m)	SSIM	Rebuild Speed (h)
LDEMGAN2.0	56.16	87.25	95.10	1.49	2.01	0.86	~0.05
LDEMGAN1.0	48.32	73.06	91.54	2.76	3.02	0.77	~0.05
ASP SFS	42.17	66.24	88.02	2.11	3.71	0.73	~12.5

Table 3. Statistics of reconstruction errors for sub-regions shown on Figure 9 based on LDEMGAN2.0.

Sub-Region	Elevation Change (m)	MAE (m)	RMSE (m)	SSIM	Maximum Error (m)
I	42.76	1.92	2.78	0.79	7.01
II	76.00	1.84	2.23	0.85	4.72
III	69.10	1.25	2.06	0.84	5.24
IV	62.01	1.77	2.74	0.77	7.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, T.; Wang, Y.; Nan, J.; Zhao, C.; Wang, B.; Xie, B.; Liu, W.-C.; Di, K.; Liu, B.; Chen, S. A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint. Remote Sens. 2025, 17, 3097. https://doi.org/10.3390/rs17173097

AMA Style

Chen T, Wang Y, Nan J, Zhao C, Wang B, Xie B, Liu W-C, Di K, Liu B, Chen S. A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint. Remote Sensing. 2025; 17(17):3097. https://doi.org/10.3390/rs17173097

Chicago/Turabian Style

Chen, Tianhao, Yexin Wang, Jing Nan, Chenxu Zhao, Biao Wang, Bin Xie, Wai-Chung Liu, Kaichang Di, Bin Liu, and Shaohua Chen. 2025. "A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint" Remote Sensing 17, no. 17: 3097. https://doi.org/10.3390/rs17173097

APA Style

Chen, T., Wang, Y., Nan, J., Zhao, C., Wang, B., Xie, B., Liu, W.-C., Di, K., Liu, B., & Chen, S. (2025). A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint. Remote Sensing, 17(17), 3097. https://doi.org/10.3390/rs17173097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Generative Adversarial Network for Pixel-Scale Lunar DEM Generation from Single High-Resolution Image and Low-Resolution DEM Based on Terrain Self-Similarity Constraint

Abstract

1. Introduction

2. Materials and Methods

2.1. Previous Work: LDEMGAN1.0

2.2. Improvement in This Work: LDEMGAN2.0

2.3. DEM Self-Similarity Loss

2.3.1. Self-Similarity Principle

2.3.2. Binary Mask Image

2.3.3. Self-Similarity Graph

2.3.4. Loss Function

2.4. Network Training

3. Results

3.1. Experimental Area

3.2. Evaluation Indicators

3.3. Reconstruction and Analysis

3.3.1. Analysis and Evaluation of Overall Reconstruction Results

3.3.2. Detailed Analysis of the Reconstruction Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI