1. Introduction
The Digital Elevation Model (DEM) plays a pivotal role in various disciplines, providing essential terrain data for geographic analysis, hydrology, land use planning, environmental protection, and climate research [
1]. Global DEM products such as the Shuttle Radar Topography Mission (SRTM) [
2,
3] and TanDEM-X [
4] are widely used around the world. These DEMs exhibit high accuracy in most terrain regions and provide essential support for various fields. However, they still have significant errors in forested areas.
Forested areas, with their complex terrain and dense vegetation, present significant challenges for DEM accuracy. The accuracy of radar-derived DEMs in forested areas is affected by the wavelength and penetration of the radar. For example, SRTM, based on C-band radar (wavelength ∼ 5.6 cm), has limited penetration, and the elevation results typically reflect a mixed height between the forest canopy and the terrain. In contrast, TanDEM-X uses X-band radar (wavelength ∼ 3.1 cm), which has even weaker penetration, resulting in measurements closer to the top of the canopy. Meanwhile, DEMs generated from optical stereo imagery (e.g., ASTER GDEM [
5] and ALOS World 3D [
6]) primarily reflect the canopy surface, making it difficult to accurately capture terrain beneath forest cover. In addition, the coarse spatial resolution and outdated nature of these DEMs hinder their applicability regarding the current needs. Some studies have attempted to fuse DEM data from different sources [
7] or use super-resolution techniques [
8,
9] to improve existing DEMs. However, due to the accuracy limitations of the DEMs themselves, these methods offer only limited improvements in accuracy and may even result in blurring of terrain details.
LiDAR (Light Detection and Ranging) technology [
10,
11] offers significant advantages for sub-canopy DEM acquisition due to its strong vegetation penetration capabilities, high vertical resolution, and horizontal accuracy [
12]. Airborne Laser Scanning provides high resolution and accuracy, making it suitable for detailed terrain and time-sensitive measurement tasks [
13,
14]. However, the high cost makes it difficult to achieve global coverage and long-term continuous acquisition [
15]. Spaceborne LiDAR can continuously acquire ground data over the long term, providing critical support for sustained monitoring. However, its relatively low resolution can result in spatial discontinuities in the data. Some studies have attempted to improve existing DEMs using spaceborne LiDAR data [
16,
17]. However, due to the sparse distribution of elevation points, it is difficult to provide effective terrain information for uncovered areas, especially large forested regions. In addition, some researchers have used LiDAR data as ground truth and combined optical imagery [
18] or SAR [
19] data as features to enhance external DEMs, but these data provide limited vertical information. Li’s research [
20] focused on correcting external DEMs by combining existing DEMs and land cover types and compared the results across different terrain and land cover types. However, it paid insufficient attention to forested areas, and the improvement in these regions was limited.
In recent years, spaceborne PolInSAR has been widely applied in forest parameter inversion [
21,
22,
23,
24]. In particular, L-band PolInSAR, with its strong penetration capability and sensitivity to ground surface information, has shown great potential for terrain mapping and structural inversion in forested areas. PolInSAR is an extension of InSAR technology [
25,
26]. It can distinguish different scattering centers corresponding to mixed scattering mechanisms occurring within the same resolution cell [
27], thereby determining the heights of different media layers within the coverage layer, as shown in
Figure 1. However, traditional PolInSAR inversion methods typically rely on simplified models, such as the random volume over ground (RVoG) model [
28,
29], which assumes a homogeneous volume structure and ideal ground return. In practice, factors such as temporal decorrelation, system noise, and baseline selection can significantly degrade inversion accuracy [
30]. As a result, the rich information embedded in polarimetric data is not fully exploited, limiting further improvements in PolInSAR accuracy in complex terrain and forested areas.
In summary, a single data source is insufficient to meet the requirements of high-precision, long-term, and large-scale terrain modeling. Spaceborne LiDAR provides sparse but highly accurate ground measurements, while spaceborne PolInSAR offers wide-area and relatively high-resolution coverage capable of capturing continuous three-dimensional structural information, but its accuracy is constrained by model assumptions and noise. The two data sources are therefore complementary, and how to effectively integrate them to improve DEM accuracy in forested regions has become an urgent issue to be addressed.
Recently, deep learning has emerged as a powerful tool to address the limitations of traditional methods, particularly in nonlinear modeling and feature representation. However, the research on PolInSAR-based sub-canopy terrain inversion using deep learning remains relatively scarce. Some studies have explored the integration of deep learning with TomoPolSAR technology, employing coherence matrices as input features to design networks for estimating both vegetation height and sub-canopy terrain [
31]. Although coherence matrices capture the polarization information in PolInSAR data and effectively model phase relationships between different polarization channels, achieving high-precision elevation reconstruction still heavily relies on radar geometric parameters. Ignoring this geometric information often leads to unreliable inversion results. Most research has concentrated on the inversion of vegetation height [
32] and aboveground biomass [
33]. For example, Zhang et al. [
34] proposed the PolGAN method, which improves spatial resolution and vertical accuracy by integrating high-resolution PolInSAR data with low-resolution large-footprint LiDAR data. This method uses Generative Adversarial Networks (GANs) and dual discriminators that focus on coherence and spatiality. In fact, vegetation height and sub-canopy terrain share certain similarities in PolInSAR data as both are derived from the decomposition of PolInSAR scattering mechanisms. Consequently, the existing methods for vegetation height inversion offer valuable insights and serve as useful references for sub-canopy terrain inversion.
In order to fully exploit the vertical information provided by spaceborne PolInSAR in forest scenarios, this paper proposes a deep learning-based workflow and designs a PolInSAR and Spaceborne LiDAR Regression/Classification Network (PSLRC-Net). This method combines spaceborne PolInSAR data with sparse spaceborne LiDAR data to refine external DEMs and produce high-quality DEMs. To address the accuracy differences among the existing open DEMs in forested and non-forested areas, a binary forest/non-forest classification labeling approach is introduced. This approach guides the model to focus on these two regions separately, effectively improving the prediction accuracy in different areas. Specifically,
- 1.
Data fusion: We propose a deep learning-based workflow for integrating spaceborne PolInSAR and sparse spaceborne LiDAR data to provide an effective solution to the challenge of high-precision reconstruction in forested areas during DEM inversion. This workflow fully exploits the complementarity of multiple data sources and combines data-driven learning methods to significantly improve the model’s predictive capability in complex terrain and topographic reconstruction.
- 2.
Forest/non-forest labeling: To improve model performance in forested and non-forested areas, this paper proposes a lightweight binary classification method. By combining the accuracy differences of external DEMs in forested and non-forested areas with PolInSAR features, this method can automatically label forested and non-forested areas, thereby optimizing the processing workflow and significantly reducing computational costs.
- 3.
Region-specific optimization: We propose PSLRC-Net, a multi-task learning network that incorporates a forest/non-forest classification branch into the elevation inversion task to guide region-specific optimization of the model. This mechanism combines classification and regression tasks, directs the model to focus on feature differences between regions, adaptively optimizes the feature learning process, and significantly improves the accuracy of elevation prediction in forested and non-forested areas.
In summary, PSLRC-Net provides an innovative solution that effectively achieves accurate extrapolation of sparse spaceborne LiDAR footprint points through deep learning techniques and multi-source data fusion. It demonstrates high accuracy and adaptability, particularly in applications involving forested and non-forested areas. The code is available at:
https://github.com/Liiiiiixs/PSLRC-Net (accessed on 20 September 2025).
The rest of this paper is organized as follows.
Section 2 introduces the basic information of the multi-source data used in this study and the details of the study site.
Section 3 provides a detailed description of the proposed method.
Section 4 presents and analyzes the experimental results.
Section 5 discusses the proposed methods. Finally, the conclusion is outlined in
Section 6.
2. Study Site and Data
This section presents basic information about the study site and provides a systematic description of the multi-source data used, including their sources, characteristics, and specific applications in the research, thus providing strong data support for the study.
2.1. Study Site
This study performs experimental validation on data from two geographic sites, as shown in
Figure 2.
The first site is located at the intersection of Luxembourg, Germany, France, and Belgium (–N, –E), with elevations ranging from approximately 130 m to 560 m above sea level (ASL). Since most of the area lies within Luxembourg, we refer to this site as LU. The northern part of this region is predominantly hilly. The central and southern parts are relatively flat, consisting of low hills and plains. Forested areas are distributed in distinct patches with well-defined boundaries.
The second site is on the border between Slovakia and Hungary (–N, –E), with elevations ranging from approximately 100 m to 1400 m ASL. We refer to this site as SK. The northern and eastern parts of this region consist mainly of hills and mountains, characterized by significant topographic variations and extensive forest cover. The central and southern parts are dominated by plains and low hills, with relatively flat terrain. In contrast to the LU, forests here are more continuous and densely concentrated, with greater variations in elevation.
2.2. Spaceborne PolInSAR Data
Paired dual-polarization SAR data for interferometry were acquired by the SAOCOM-1B satellite under repeat-pass conditions with channels VH and VV. The SAOCOM-1B satellite is equipped with an L-band SAR operating at a frequency of approximately 1275 MHz, with a spatial resolution of about 3.75 m × 3.70 m. The spatial baselines for the LU and SK are 900.51 m and 516.94 m, respectively, and the temporal baselines are 31 days and 16 days, respectively. These data provide critical information support for analyzing regional surface characteristics.
2.3. Airborne LiDAR Data
The acquisition of airborne LiDAR-derived DEM (AL-DEM) data in Luxembourg began in February 2019, covering the entire geographic area of the country. The data have an average point density of approximately 15 points per square meter. The horizontal accuracy of the data is within ±3 cm, and the vertical accuracy is within ±6 cm.
The collection of Slovakia AL-DEM data was scheduled during the vegetation-free season. The last-return point density is at least 5 points per square meter, with one transverse swath per flight mission and a 20% overlap between swaths. The vertical accuracy of the point cloud is 0.11 m, and the horizontal position accuracy is 0.30 m.
2.4. Spaceborne LiDAR Data
2.4.1. ICESat-2 Dataset
The ATL08 (
) product from the ICESat-2 (
https://nsidc.org/data/atl08/versions/6) (accessed on 20 September 2025) satellite serves as the primary input for deep learning-based mapping. This specialized laser altimetry dataset is designed to measure terrain and vegetation, providing key parameters such as ground elevation and canopy height. The
represents the terrain elevation at the midpoint of each 100 m segment, obtained from the best polynomial fit to ground photons, and is recommended as the most robust estimate of terrain elevation in ATL08. It provides a spatial resolution of approximately 100 m along the orbital track and a vertical measurement accuracy of centimeters, allowing for the capture of very fine variations in terrain and vegetation height [
35]. We use the
parameter from the ICESat-2 data to characterize data quality, and exclude footprints with
. For anomalous data that cannot be effectively identified using this parameter alone, we further apply filtering based on an external DEM (TanDEM-X): footprints are removed if the elevation difference with the external DEM is less than −5 m or greater than 40 m.
2.4.2. GEDI Dataset
The L2A (
) product from GEDI (
https://search.earthdata.nasa.gov/search) (accessed on 20 September 2025) serves as the primary input for deep learning-based mapping. It provides detailed data on ground elevation and vegetation height, with individual laser footprints approximately 25 m in diameter. Vertical measurement accuracy ranges from a few centimeters to just over a dozen centimeters at the sub-meter level [
36]. We extract parameters from the GEDI L2A data to characterize data quality, such as
,
, and
. These parameters are then used for screening the GEDI data, retaining footprints that meet the criteria of
,
, and
. For anomalous data that cannot be effectively identified using the above parameters, we also apply a filtering strategy based on an external DEM (TanDEM-X): footprints are excluded if their elevation difference from the external DEM is less than −5 m or greater than 40 m.
4. Results
We validated PSLRC-Net through experiments with spaceborne PolInSAR data. The quantity and quality of the spaceborne LiDAR data at the two sites are summarized in
Table 2, with the RMSE derived using AL-DEM as the baseline. Unless otherwise noted, the ground truth data in our experiments are derived from ICESat-2, and the reference DEM is based on TanDEM data. The impact of the GEDI dataset is analyzed in detail in
Section 5.1, focusing on how data quality and quantity affect the experimental results. The impact of different reference DEMs on the experimental results is further analyzed in
Section 5.2, where we evaluate how the choice of reference data affects the accuracy of DEM generation.
During data preprocessing, we standardized the features to improve numerical stability during model training. Specifically, before windowing, we first computed the mean and standard deviation for each feature across the entire data range shown in
Table 1 and normalized each feature based on these values. This approach ensures consistent scaling of the features across the entire dataset, thereby avoiding any potential bias that could result from normalization within each window. The spaceborne LiDAR footprint points were divided into a training set and a test set, with 80% of the data used for training and 20% for testing.
4.1. Binary Classification Performance Assessment
This section presents the output results of the PSLRC-Net classification branch to evaluate its effectiveness in the classification task. For the visualization of dual-polarization data, we used a lexicographic basis instead of the Pauli basis, and used a color combination similar to Pauli RGB: the R channel represents the magnitude of the VV channel, the G channel represents the magnitude of the VH channel, and the B channel represents the magnitude of the VV channel. This visualization method is called pseudo-Pauli RGB.
Figure 5a shows the clustering results of all the footprint points from the LU in a 3D coordinate system composed of elevation difference,
H, and
. The spatial distribution of the clustering results in the pseudo-Pauli RGB is shown in
Figure 5b. For clarity, we have chosen a subset of the LU. It can be seen that the “Ground” and “Forest” data points perform well in their respective areas, while the “Other” data points appear not only in the ground area but also in parts of the forested area. After excluding the “other” category, a set of accurate labeled data is obtained. By using this labeled data to train the SVM model, the classification results of all spaceborne LiDAR footprints can be obtained, and their spatial distribution is shown in
Figure 5c. These classification labels can be used to further train PSLRC-Net.
Figure 6 shows the output of the classification branch of PSLRC-Net for the two test sites. The optical imagery from Google Earth and the pseudo-Pauli RGB images partially reflect the distribution of forested and non-forested areas and can serve as references for validating the classification results. As shown in the figure, the classification results agree well with the distributions observed in the optical imagery and the pseudo-Pauli RGB images. This consistency demonstrates that the model effectively captures the distinctive features of different categories and uses these features for accurate classification, providing strong support for its performance in elevation inversion tasks.
For quantitative assessment, the classification accuracy on the test sets of the two sites reached 99.70% and 99.67%, respectively, demonstrating the high reliability of the classification branch.
4.2. Regression Performance Evaluation for PSLRC-Net
To evaluate the performance of PSLRC-Net in the regression prediction task, this section compares it to other methods. Traditional machine learning methods such as Random Forest (RF), XGBoost, Gradient Boosting Decision Tree (GBDT), K-Nearest Neighbor (KNN), and Support Vector Regression (SVR) have been widely applied in the inversion of forest parameters like tree height and aboveground biomass [
43,
44] through the fusion of PolInSAR and sparse LiDAR data. In contrast, deep learning techniques have seen more limited application in this field, especially in the inversion of understory DEMs. In this section, we compare these traditional methods with the PSLRC-Net approach.
Specifically, we use TanDEM data as a benchmark, train different methods on the same training set, and evaluate their performance differences on the test set. The results are presented in
Table 3.
Compared to the TanDEM dataset, all the algorithms show improvements in various metrics for the DEM regression task, but there are significant performance differences between the algorithms. The performance of KNN and SVR is relatively weak. Although these two algorithms can provide stable predictions to some extent, their overall performance in elevation inversion tasks is significantly inferior to the other methods. This may be due to their limited ability to model complex high-dimensional data patterns and handle nonlinear relationships. Ensemble learning algorithms such as RF, XGBoost, and GBDT show relatively high accuracy and stability. By combining the predictions of multiple base learners, these algorithms improve the generalization ability and prediction accuracy of the model, resulting in reasonably good performance in the height inversion task. In comparison, PSLRC-Net outperforms all the other methods on all the evaluation metrics. This indicates that PSLRC-Net has superior modeling capabilities for handling complex high-dimensional data patterns and is particularly well suited to the demands of elevation inversion tasks.
It should be noted that the
values in
Table 3 being close to 1 do not indicate model overfitting. This mainly results from the large elevation range of the study areas. By definition,
when the overall relief is large, the denominator becomes very large, and even meter-level residuals make the
values close to 1. Thus, the high
R2; reflects a terrain-scale effect rather than model overfitting. Meanwhile, the RMSE and MAE on independent test sets are significantly reduced compared with the baseline methods, which more directly reflect the prediction accuracy and generalization ability.
4.3. DEM Result Assessment
Based on the excellent performance of PSLRC-Net on the test set, this section uses PSLRC-Net to extrapolate the spaceborne LiDAR footprints to the entire area and validates the results using AL-DEM. The inversion results of the DEMs for the two sites are shown in
Figure 7.
We compared the PSLRC DEM with several external DEMs, including ASTER GDEM, SRTM, AW3D30, TanDEM-X (sorted by acquisition time), and DEMs derived from the traditional RVoG model, all converted to the EGM96 [
45] height reference. The three-stage inversion method of the traditional RVoG model suffers from the “double-candidate effect” [
21], and dual-polarization data cannot utilize the different penetration depths of the scattering channels under the Pauli basis to select the terrain phase [
46]. To solve this problem, we adopted the method proposed in Ref. [
22] to obtain the RVoG model DEM, which effectively solves the terrain phase selection problem under dual-polarization data.
Considering the large range of DEM values, it is difficult to directly observe the differences between different DEMs. Therefore, this study presents the elevation differences between different DEMs and AL-DEM. Specifically, in the difference maps, the green areas near the zero value on the color bar indicate that the DEM is closer to the AL-DEM, while the yellow areas indicate that the DEM is higher relative to the AL-DEM. As shown in
Figure 8, all the DEMs except the PSLRC DEM show significant differences from the AL-DEM. GDEM performs the worst, while the other DEM products show relatively consistent overall trends, generally overestimating elevations in forested areas. Among them, AW3D30 shows significant elevation discontinuities caused by regional stitching. Due to the low stability of the RVoG model inversion, the inversion results fail in some areas. In contrast, the PSLRC DEM is closer to the AL-DEM.
Based on the obtained classification map of the forested and non-forested areas, we used the AL-DEM to perform numerical assessments of the different areas within the two sites. The results are shown in
Table 4. It is evident that the PSLRC DEM outperforms the other DEMs in all the metrics in different areas, with particularly significant improvements in forested areas. In the entire region, the RMSE decreased by 51.7–64.6% and 51.9–63.7% at the two sites, while the MAE decreased by 55.5–66.8% and 55.5–68.6%. In the forested areas, the RMSE decreased by 46.4–65.2% and 51.9–63.4%, and the MAE decreased by 50.2–68.2% and 53.8–66.2%. Even in land areas with relatively small errors, the PSLRC DEM still achieves high accuracy, demonstrating its reliability and precision in elevation inversion.
Comparing the difference histograms of the different DEMs with AL-DEM in
Figure 9, it can be seen that GDEM is generally lower, while the SRTM and AW3D30 DEMs have similar distributions. The height difference of PSLRC DEM and TanDEM is close to zero, but, due to the overestimated heights in forested areas, TanDEM still shows a large distribution in regions with high height differences.
4.4. Ablation Study
To comprehensively understand the contribution of each component to the overall model performance, this study conducted ablation experiments to systematically evaluate the key modules of the model. Specifically, the core components were individually removed or replaced, followed by model retraining and DEM inversion. The regional accuracy of the model was then evaluated by comparison with the AL-DEM. Detailed results are presented in
Table 5.
The table sequentially lists models with the ablation of the feature encoder (FE), expert mechanism (EM), classification head (CH), one regression tower (ORT), as well as interference geometry features (IG), polarization features (Pol), and polarimetric decomposition features (PDec). The objective of this analysis is to assess the impact of each module or feature on the overall performance of the model.
In the module ablation study, removing the feature encoder resulted in a significant decrease in model accuracy, highlighting its critical role in feature extraction. The classification head and the dual regression towers form the core of the multi-task structure. The ablation experiments showed that removing the classification head or using a single regression tower significantly affected performance, demonstrating that the multi-task structure effectively enhances the model’s ability to learn complex regional features, thereby improving regression accuracy. In addition, the removal of the expert mechanism also caused a noticeable degradation in model performance. In the feature ablation study, the exclusion of polarimetric decomposition features led to a significant reduction in model accuracy, underscoring their importance in modeling complex terrain such as forested areas. In addition, both interference geometry and polarization features contributed valuable information to the model, which together ensured superior performance in high-precision prediction. Finally, the complete PSLRC-Net achieves the lowest RMSE and MAE on both the LU and SK datasets, clearly demonstrating the superiority of its overall architecture.
4.5. Computational Efficiency Analysis
This section evaluates the computational efficiency of PSLRC-Net. We implemented the proposed PSLRC-Net using the PyTorch 2.4 framework, and all the experiments were conducted on an NVIDIA GeForce RTX 4080 GPU. The training batch size was set to 64, with the AdamW optimizer and an initial learning rate of 2 × 10
−4. To dynamically adjust the learning rate, we used an ExponentialLR scheduler with a decay factor of 0.95. During inference, the batch size was set to 4096 to fully utilize the GPU memory and improve computational efficiency.
Table 6 shows the training time (Tr. Time), inference time (Infer. Time), and inference memory usage (Infer. Mem.) for the LU and SK test areas.
This method completes the training of large-area data in 5–8 min and generates high-resolution DEMs in 15 min, demonstrating good computational efficiency. Meanwhile, the memory consumption during inference is about 348.89 MB, indicating that the method has relatively low memory consumption and shows efficient resource utilization.
6. Conclusions
This paper proposed PSLRC-Net for generating high-quality DEMs and introduced a method for obtaining classification labels of spaceborne LiDAR footprint points for the classification branch of the network. PSLRC-Net first extracts feature cubes from PolInSAR data and an external DEM and further refines them using a feature encoder module to extract rich and detailed information. Next, a gated expert selection module is used to provide specialized support for the classification and regression branches. The outputs of the two regression branches are then weighted based on the output of the classification branch, allowing for more accurate predictions. This approach efficiently extrapolated high-resolution elevation information from spaceborne LiDAR over large areas and generated high-resolution high-precision DEMs in the regions of Luxembourg and Slovakia. The ablation experiments verify the effectiveness of the modules and selected features in PSLRC-Net in improving model performance. Meanwhile, we also analyzed the effect of the quality and quantity of the target values on the prediction results, as well as the influence of different reference DEMs on the prediction results.
In future research, our goals are divided in two directions: On the one hand, by extending the training dataset, we aim to transform the in-domain problem into a global one, thus achieving a more universal model. On the other hand, pixel-based inversion network architectures are limited by the available sources of information. Although window operations can capture some neighborhood information, they still struggle to fully exploit a wider range of surrounding information. Therefore, the next step will be to move from pixel-level inversion to region-level inversion to more effectively capture and utilize a wider range of contextual details. In addition, the research will further improve the prediction accuracy by optimizing the model structure and algorithms.
Furthermore, PSLRC-Net is not only applicable to DEM generation tasks but can also be extended to other forest parameter inversion tasks, such as tree height and biomass inversion. Given the significant differences between forested and non-forested areas for these tasks, our network architecture is able to fully exploit these regional feature differences, further improving inversion accuracy across different domains.