Next Article in Journal
VAWIlog: A Log-Transformed LSWI–EVI Index for Improved Surface Water Mapping in Agricultural Environments
Previous Article in Journal
Assessment of the Applicability of Hue from In Situ Spectral Measurements to Remote Sensing of Plant Phenology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction

The School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(16), 2770; https://doi.org/10.3390/rs17162770 (registering DOI)
Submission received: 14 June 2025 / Revised: 1 August 2025 / Accepted: 7 August 2025 / Published: 9 August 2025

Abstract

Digital elevation models (DEMs), as the fundamental unit of terrain morphology, are crucial for understanding surface processes and for land use planning. However, automated classification faces challenges due to inefficient terrain feature extraction from raw LiDAR point clouds and the limitations of traditional methods in capturing fine-scale topographic variations. To address this, we propose a novel hybrid RNN-CNN framework that integrates multi-scale Topographic Position Index (TPI) features to enhance DEM generation. Our approach first models voxelated LiDAR point clouds as spatially ordered sequences, using Recurrent Neural Networks (RNNs) to encode vertical elevation dependencies and Convolutional Neural Networks (CNNs) to extract planar spatial features. By incorporating TPI as a semantic constraint, the model learns to distinguish terrain structures at multiple scales. Residual connections refine feature representations to preserve micro-topographic details during DEM reconstruction. Extensive experiments in the complex terrains of Jiuzhaigou, China, demonstrate that our lightweight hybrid framework not only achieves excellent DEM reconstruction accuracy in complex terrains, but also improves computational efficiency by more than 20% on average compared to traditional interpolation methods, making it highly suitable for resource-constrained applications.

1. Introduction

Digital Elevation Models (DEMs) play a crucial role in various geospatial applications that significantly impact the study of the ecological environment and natural resources [1,2]. The accurate reconstruction of fine-scale DEMs from raw LiDAR point clouds is particularly critical, as these high-precision terrain representations serve as indispensable foundational datasets for a multitude of advanced applications. For instance, in geomorphological analysis [3,4], highly detailed DEMs enable precise landform extraction and classification, as well as the derivation of terrain covariates crucial for understanding geological processes and soil properties. In hydrological modeling [5,6], they are fundamental for accurate surface drainage mapping, water flow simulation, and delineating watersheds, which are vital for water resource management and flood risk assessment. Furthermore, for ecological and environmental studies [7,8], high-fidelity DEMs provide essential topographic context for analyzing fire regimes in forest ecosystems, and for developing robust ground filtering algorithms to process UAV LiDAR data in complex forested environments, facilitating more accurate vegetation and terrain analysis. Consequently, the development of automated algorithms capable of transforming raw point cloud data into high-precision DEMs is essential for improving the efficiency and accuracy of digital terrain modeling workflows [9].
Current methodologies for DEM reconstruction can be comprehensively categorized into three distinct and well-established main types: traditional interpolation methods, machine learning approaches, and deep learning techniques, each representing different philosophical approaches to terrain modeling and data processing. Traditional computational approaches, which include fundamental techniques such as nearest-neighbor interpolation [10] and linear interpolation methods, are characterized by their computational simplicity and operational efficiency. These conventional methods have been widely adopted due to their straightforward implementation and minimal computational overhead. However, despite their practical advantages, they frequently encounter significant difficulties when attempting to capture fine-scale terrain features and topographic details. Furthermore, these traditional methods demonstrate notable inadequacy when confronted with the complex task of handling heterogeneous landscapes that exhibit varying terrain characteristics and spatial complexity [11]. More sophisticated and technically advanced techniques have been developed to address some of these fundamental limitations. These enhanced approaches encompass a range of methodologies including morphological filtering techniques [12], slope-based filtering algorithms [13] that utilize specific threshold criteria for terrain classification, and comprehensive segmentation-based filtering algorithms [14] that partition the landscape into distinct topographic units. While these advanced techniques demonstrate measurably improved performance compared to their traditional counterparts, they continue to exhibit significant sensitivity to data noise and environmental interference. Additionally, these methods typically necessitate extensive manual preprocessing procedures and parameter tuning, which can be both time-consuming and require substantial domain expertise [15]. In response to these persistent limitations and the growing demand for more robust solutions, machine learning-based methodologies have progressively emerged as promising alternatives in the field of DEM generation. These innovative approaches fundamentally reframe the DEM generation process as a sophisticated regression problem that can be solved through statistical learning techniques. Advanced algorithms including decision trees with hierarchical splitting criteria, support vector machines (SVM) with kernel-based transformations [16], and random forests (RF) [17,18] ensemble methods have been successfully implemented to tackle this challenge. These machine learning approaches strategically leverage a diverse array of geospatial features including point density distributions, local slope characteristics, and comprehensive neighborhood statistical measures to inform their predictive models [19,20,21,22,23]. While these machine learning methods demonstrate notable improvements in adaptability to complex and varied terrain conditions, their practical effectiveness continues to be heavily dependent on a fundamental assumption. This critical assumption requires that training datasets and testing datasets maintain similar spatial distributions and geomorphological characteristics. Consequently, this dependency significantly restricts their generalizability across different geographic regions and terrain types. The performance of these methods exhibits considerable variation when applied across different geomorphic regions with distinct topographic signatures. Moreover, maintaining large-scale spatial consistency remains a persistent challenge that compromises the overall reliability of these approaches [24,25,26].
With the unprecedented rapid advancement and maturation of deep learning technologies in recent years, entirely new possibilities and opportunities have emerged for substantially improving model generalization capabilities and enhancing spatial coherence in DEM reconstruction tasks [27,28,29,30]. Convolutional Neural Networks (CNNs) [31,32], which represent a paradigm shift in spatial data processing, possess the remarkable ability to effectively capture complex two-dimensional spatial patterns and hierarchical feature representations. This inherent capability enables these networks to achieve superior learning of intricate topographic features and terrain characteristics [33]. Through this enhanced learning capacity, CNNs successfully overcome many of the fundamental limitations that have historically plagued traditional machine learning approaches in geospatial applications [34]. Although several pioneering research studies have demonstrated successful applications of deep learning methodologies to high-resolution DEM reconstruction tasks [35,36,37,38], most of these studies focus on reconstructing high-accuracy DEMs from low-accuracy DEMs. In contrast, only a few studies have achieved the derivation of highly accurate DEMs from digital surface models (DSMs) through sophisticated neural network architectures [39]. However, despite these encouraging developments and technological advances, the challenge of directly generating high-precision DEMs from raw LiDAR point clouds continues to represent a significant and largely unresolved technical challenge. This particular application requires handling the inherent complexity and irregular nature of point cloud data while maintaining the spatial accuracy and topographic fidelity necessary for high-quality DEM products.
To overcome these challenges, we propose an innovative hybrid RNN-CNN framework, which integrates multi-scale Topographic Position Index (TPI) features to achieve the coordinated optimization of high-precision DEM reconstruction. The TPI method is one of the most widely used techniques for landform classification. By calculating the difference in elevation between a given point and its surrounding area, TPI helps in identifying ridges, valleys, peaks, and other landforms. The approach proposed by Jennes [40] and Weiss [41] utilizes fixed neighborhood sizes to calculate the TPI and classify landforms. In order to obtain a more refined DEM, we introduced TPI. In summary, this study constructed a DEM reconstruction method that takes into account elevation dependency, planar spatial features, and multi-scale terrain semantic constraints by integrating the hybrid framework of RNN and CNN with multi-scale TPI features. It not only retains high-resolution terrain details, but also improves reconstruction accuracy through multi-scale landform structure learning, providing an innovative solution for high-precision DEM reconstruction.
Our method consists of three key contributions. We propose a hybrid RNN-CNN framework for end-to-end high-precision DEM reconstruction directly from raw LiDAR point clouds. This framework uniquely leverages RNNs to model vertical elevation dependencies within voxel columns and CNNs to capture planar spatial features, thereby effectively processing multi-dimensional terrain information. We innovatively integrate multi-scale TPI as a differentiable semantic constraint within the deep learning loss function, guiding the model to generate DEMs with enhanced geomorphometric fidelity and realistic terrain patterns across various scales. Extensive experiments conducted in the complex terrain of Jiuzhaigou demonstrate that our lightweight model achieves high-precision DEM reconstruction accuracy and improved computational efficiency by more than 20% on average compared to traditional interpolation methods, offering a practical solution for resource-constrained applications.
The structure of this paper is as follows:
  • Section 2 introduces our data;
  • Section 3 details the proposed method and its innovations;
  • Section 4 presents the experimental setup and results, highlighting the advantages of our method over traditional approaches;
  • Section 5 discusses the implications of our research findings and potential future directions.

2. Data Set

The study area (Figure 1) is located in the northern part of Jiuzhaigou County, Aba Tibetan and Qiang Autonomous Prefecture, in the southern Minshan Mountain Range of northwest Sichuan, China. Situated in the Jiuzhaigou National Nature Reserve, it is part of a transitional zone between the southeastern Tibetan Plateau and the Sichuan Basin. The region lies at the boundary of China’s first topographic step, where diverse geomorphological processes have occurred, including fluvial and glacial erosion, forming the predominant mountainous, hilly, and valley landscapes. The study area (103°50′15″–103°55′15″E, 33°4′45″–33°11′15″N) covers 65.9 km2, with elevation varying between 2306 and 4306 m, and a relative elevation difference of 2000 m. Steep high mountains bound the eastern and western sides of the region, while a north–south trending high mountain canyon runs through the middle, creating five small alpine lakes in the northern part.
For our experimental evaluations, specific subsets of this high-resolution LiDAR dataset from Jiuzhaigou were selected to represent distinct terrain characteristics. Experiment 1 focused on data representing steep slope terrain, while Experiment 2 utilized data from undulating terrain. For both experiments, the corresponding ground truth DEMs used for training and evaluation were meticulously prepared from these same high-resolution LiDAR point clouds. These reference DEMs represent the bare-earth terrain, generated through established manual to remove non-ground features such as vegetation and buildings, followed by high-fidelity interpolation to a regular grid. This ensures that our model learns from precise terrain representations during the supervised training phase. The details of the LiDAR point cloud dataset for the Jiuzhaigou study area are presented in Table 1.

3. Methodology

Our methodology is designed to transform raw LiDAR point clouds into high-fidelity DEMs through a lightweight design process. This comprehensive process encompasses voxelization, feature extraction leveraging both RNNs and CNNs, and the application of topographic supervision via TPI constraints. The process commences with denoising and voxelization of the raw input data, followed by ranking and embedding elevation values to capture vertical relationships. An RNN encodes the vertical context, while a CNN extracts planar spatial features, with the resulting representations fused via residual connections. The fused features are then decoded into a DEM under multi-scale TPI constraints, and the output is refined through a threshold-based median filtering post-processing step to enhance topographic fidelity. By integrating the sequential processing strengths of RNNs with the spatial feature extraction capabilities of CNNs, and enhancing the output with TPI-driven geomorphometric coherence. The complete workflow is illustrated in Figure 2.

3.1. Voxelization and Spatially Ordered Sequences

In processing the raw LiDAR point cloud, we first perform preprocessing to eliminate noise and outliers, ensuring a high-quality dataset for subsequent analysis. Following this, the denoised points are systematically mapped onto a regular voxel grid defined by a uniform spatial resolution. Each voxel is uniquely identified by its coordinates within this grid, and for every horizontal position (each ( i , j ) grid cell on the 2D plane), we aggregate the elevation values of occupied voxels along the vertical dimension (along the k-axis, forming a vertical column of voxels). These elevation values are meticulously ordered to construct a sequence that encapsulates the vertical profile at each location. This spatially ordered sequence representation is particularly advantageous in managing regions with heterogeneous point densities, as it selectively retains valid elevation data, thereby preserving the intricate three-dimensional geometry of the terrain in an efficient and organized manner. Mathematically, the spatial resolution of the voxel grid is defined as ( Δ x , Δ y , Δ z ) , with each voxel V ( i , j , k ) centered at
x i = x min + i Δ x , y j = y min + j Δ y , z k = z min + k Δ z
where ( x min , y min , z min ) are the minimum coordinates of the point cloud and i , j , k are integer indices. At each horizontal location ( i , j ) , we collect all occupied z k values and sort them in ascending order to form a variable-length sequence
S i , j = { z k 1 , z k 2 , , z k m } , k 1 < k 2 < < k m
where m is the number of nonempty voxels in that column. This “spatially ordered sequence” representation addresses the issue of empty voxels by only storing valid elevations and provides an efficient and organized way to capture the detailed vertical distribution of points, which is crucial for modeling the terrain’s vertical profile within our network.

3.2. Elevation Index Representation and Encoding

To address the inherent variability and noise in raw elevation measurements, we employ a rank-based transformation strategy. Specifically, each elevation within a sequence is converted into its positional rank relative to other elevations at the same horizontal location. This ordinal representation, denoted as r k , emphasizes the vertical relationships among points, mitigating the influence of absolute elevation fluctuations and noise. Subsequently, these ranks are projected into a high-dimensional vector space through a learnable embedding mechanism, which encodes the structural dependencies and relational attributes of the elevation sequence. The embedding for a rank r k is denoted as e k . These embedded vectors serve as the foundational input for the sequence encoding phase, enabling robust modeling of vertical patterns across the terrain. For each elevation z k in S i , j , its rank is defined as
r k = Rank z k , S i , j
and its embedding ( e k ) is
e k = ϕ r k , k = 1 , , m
where ϕ is a learnable embedding function (inspired by Natural Language Processing (NLP) [42] embedding techniques) that produces d-dimensional vectors capturing semantic relationships among elevation ranks. These embeddings serve as the input to our sequence encoder. Specifically, we utilize 300 distinct elevation rank levels, and the embedding dimension d is set to 32.

3.3. Three-Dimensional Feature Extraction Framework

Given the computational demands of directly applying three-dimensional convolutions across the entire voxel grid, we adopt a hybrid processing paradigm that judiciously separates the extraction of vertical and horizontal features. This dual-strategy approach is designed to optimize computational efficiency while aiming to capture critical three-dimensional spatial information, thereby facilitating the generation of accurate and detailed DEMs.

3.3.1. Sequential Feature Modeling with RNNs

At each horizontal grid position, the sequence of embedded elevation ranks is processed by a Gated Recurrent Unit (GRU)-based Recurrent Neural Network. The GRU architecture is particularly well-suited for this task, as it effectively captures long-range dependencies and local geometric nuances within the elevation sequence. By iteratively processing the sequence, the GRU distills a comprehensive representation of the vertical structure into its final hidden state. This compact encoding retains essential elevation details without requiring aggressive downsampling, thereby preserving the fidelity of the terrain’s vertical characteristics. For each voxel ( i , j , k ) , the hidden state ( h i , j ) is obtained through the GRU’s processing of the embedding ( e k ).

3.3.2. Spatial Feature Enhancement with CNNs

To integrate lateral spatial relationships into the feature set, the hidden states generated by the GRU are organized into a two-dimensional feature map aligned with the horizontal grid layout. This map is subsequently processed by a CNN employing a U-Net-inspired architecture. The CNN applies a series of downsampling and upsampling operations, augmented by residual connections, to extract multi-scale spatial features. These residual connections play a pivotal role in preserving fine-scale details from the RNN outputs across the convolutional layers, ensuring that the resulting feature set captures both broad spatial patterns and localized terrain variations. The CNN output is then fused with the original RNN hidden states through an additional residual linkage, creating a rich feature representation that balances vertical and horizontal terrain information. Mathematically, the fused feature is
f i , j = h i , j + c i , j
where c i , j is the feature extracted by the CNN.

3.4. Decoder Design and Operational Mechanism

The decoder constitutes a pivotal element of our methodology, tasked with synthesizing the final elevation predictions from the integrated features produced by the preceding stages. It leverages the complementary strengths of sequential and spatial processing to generate DEMs that accurately reflect the terrain’s multifaceted characteristics. The decoder employs a GRU-based recurrent neural network, complemented by an embedding layer for input processing and a fully connected layer for output mapping. This configuration enables the generation of elevation class predictions tailored to the quantized elevation space. The decoding process begins with a placeholder input, which is embedded into a high-dimensional space to establish the starting point for sequence generation. Subsequently, this embedded input is combined with the fused features from the encoder and CNN, providing a holistic context for elevation prediction. The GRU processes this integrated input in a single pass, producing an output that encapsulates the synthesized terrain information. This output is then transformed through a fully connected layer into logits representing quantized elevation classes, which are subsequently mapped to continuous elevation values. The residual connection ensures that intricate vertical details from the encoder are effectively incorporated into the decoding process, enhancing the reconstruction of complex terrain features. The predicted elevation ( z ^ i , j ) is obtained from the class prediction k ^ i , j = arg max k p i , j ( k ) , where ( p i , j ( k ) ) is the probability of class k.

3.5. Incorporation of TPI Constraints

To ensure that the generated DEMs exhibit realistic topographic characteristics, we introduce supervision based on the TPI during the training phase. TPI serves as a localized metric that quantifies terrain features by comparing a point’s elevation to the average elevation of its surrounding neighborhood. At a given scale (s), TPI is computed as
TPI i , j ( s ) = z ^ i , j 1 | N s ( i , j ) | ( p , q ) N s ( i , j ) z ^ p , q
where N s ( i , j ) is the set of neighbors within distance (s). We calculate TPI values for both predicted and ground-truth elevations across multiple scales. The TPI loss, denoted as L TPI , is defined as the mean squared difference between these values, aggregated over all grid cells and scales:
L TPI = s S 1 N i , j TPI i , j ( s ) TPI i , j , gt ( s ) 2
Here, S represents the set of scales chosen for TPI calculation. This multi-scale approach allows the model to capture a comprehensive range of terrain features, from localized micro-topography to broader geomorphological structures. For our experiments, we empirically selected three distinct scales: a small scale ( s 1 = 3 × 3 grid cells), a medium scale ( s 2 = 9 × 9 grid cells), and a large scale ( s 3 = 27 × 27 grid cells) to represent local, mesoscale, and regional topographic variations, respectively. Each scale is treated equally in the summation.
This TPI loss is combined with the primary elevation classification objective to form the total loss function. The primary objective, L cls , is based on a masked softmax cross-entropy loss, designed to handle areas with no valid ground truth data. The total loss function ( L total ) is formulated as
L total = L cls + λ L TPI
where λ is a weighting coefficient that balances the contribution of the TPI loss relative to the classification loss. Through empirical tuning, we set λ = 0.1 . This weighting scheme ensures a balance between precise elevation prediction and geomorphometric fidelity, reducing artifacts and enhancing the realism of the DEM. The weight λ is kept static throughout the training process.
By enforcing TPI constraints at various scales, the model captures a spectrum of terrain features, from fine-scale details to broader landforms. The weighted TPI loss ensures a balance between elevation accuracy and geomorphometric fidelity, reducing artifacts and enhancing the realism of the DEM.

3.6. Post-Processing Refinement

To polish the predicted DEM and address residual noise, we implement a post-processing phase utilizing a threshold-based median filtering technique. For each grid cell, the median elevation of its immediate neighbors is computed. If the predicted elevation deviates from this median beyond a specified threshold, it is adjusted to the median value. This step mitigates isolated anomalies, such as spikes or depressions, while preserving genuine micro-topographic features by avoiding excessive smoothing. Mathematically, if | z ^ i , j m i , j | > τ , where m i , j is the neighborhood median, then z ^ i , j is set to m i , j . This refinement ensures that the final DEM is both smooth and detailed, rendering it suitable for advanced terrain analysis and visualization applications.

4. DEM Reconstruction Results and Discussion

4.1. Implementation Details

Our proposed hybrid RNN-CNN framework was implemented using the PyTorch 2.1.0 deep learning library. The model was trained on a laptop system equipped with an Intel Core i7-12700H processor, 32 GB DDR4 RAM, 1 TB SSD storage, and an NVIDIA GeForce RTX 3070Ti Laptop GPU with 8 GB VRAM (NVIDIA Corporation, Santa Clara, CA, USA; Intel Corporation, Santa Clara, CA, USA). Key parameters and configurations are as follows:
  • Model Architecture:
    • The hidden state size for the GRU layers in the RNN encoder was set to 32.
    • The number of GRU layers was 2.
    • The CNN component utilized feature channels with dimensions [ 128 ,   256 ,   512 ,   1024 ,   1024 ] across its layers.
  • Training Configuration:
    • The model was trained for 50 epochs.
    • The learning rate ( l r ) was initialized to 1 × 10 4 .
    • The batch size for both training and testing DataLoaders was set to 1.
    • We employed the Adam optimizer with a weight decay of 1 × 10 5 .

4.2. Baseline DEM Generation for Comparative Analysis

To provide a comprehensive comparative analysis of our proposed end-to-end method, baseline DEMs were generated from the raw LiDAR point clouds using established traditional filtering and interpolation techniques. This conventional process typically involves two main steps: first, robustly identifying and classifying ground points from the raw point cloud, and second, interpolating these identified ground points onto a regular grid to form a continuous DEM.
For the ground point extraction phase, two widely recognized morphological filtering algorithms were employed to process the raw LiDAR point clouds:
  • Progressive Morphological Filter (PMF): This algorithm iteratively applies morphological opening operations with increasing window sizes to effectively distinguish and remove non-ground objects, leaving behind bare-earth ground points.
  • Simple Morphological Filter (SMRF): This method utilizes a series of basic morphological operations to robustly classify ground and non-ground points based on elevation differences and geometric properties within local neighborhoods.
After ground points were successfully extracted by both PMF and SMRF, these filtered point clouds were then used as input for DEM generation via two standard spatial interpolation methods:
  • Linear Interpolation: This technique involves creating a Triangulated Irregular Network (TIN) from the filtered ground points. Elevation values for grid cells are then determined by linear interpolation within the triangles that cover the respective grid locations.
  • Nearest-Neighbor Interpolation: This simple method assigns the elevation of the closest ground point in the filtered dataset to each corresponding grid cell within the DEM.
The DEMs derived from these combinations of filtering and interpolation (PMF + Linear, PMF + Nearest Neighbor, SMRF + Linear, SMRF + Nearest Neighbor) served as the primary baselines for quantitative and qualitative comparison against our proposed hybrid RNN-CNN approach. This comparative setup allows us to critically evaluate the performance of our end-to-end framework against established traditional two-step workflows in reconstructing high-precision DEMs from raw LiDAR data.

4.3. Visual Comparison Analysis

Figure 3 provides a comprehensive comparative analysis of DEMs produced using various methodological approaches. This visual comparison demonstrates distinct differences in reconstruction quality and spatial accuracy across different techniques. Qualitative assessment reveals significant deficiencies in traditional interpolation approaches, even when preceded by morphological filtering. Both linear and nearest-neighbor interpolation methods yield DEMs with reduced quality.
Specifically, the PMF, while effective at removing many non-ground features, often results in small gaps or voids within the filtered ground point cloud. Subsequent interpolation methods may struggle to accurately fill these fine-scale absences, leading to minor discontinuities. Conversely, the SMRF can incompletely filter out certain non-ground points, such as scattered trees, leading to their erroneous inclusion in the interpolated DEMs and manifesting as artificial elevation variations. Additionally, linear interpolation methods often introduce distinct linear artifacts along the edges of the generated DEMs, particularly noticeable at block boundaries or where data density changes.
In contrast, our proposed hybrid RNN-CNN method addresses these challenges effectively. Its end-to-end learning framework, designed to process raw point clouds and leverage multi-scale TPI constraints, reconstructs terrain surfaces with improved continuity. The method reduces the appearance of gaps or extraneous features and does not introduce the characteristic linear edge artifacts observed with traditional linear interpolation. This results in DEMs that are more consistent and suitable for applications requiring high-fidelity terrain representation.
Furthermore, to provide a more detailed understanding of our method’s performance at a fine scale, Figure 4 presents magnified views comparing DEMs generated by our method against the ground truth. These close-up visualizations reveal that, in the vast majority of cases, our reconstructed DEM closely matches the ground truth, affirming its high fidelity. However, in rare instances, minor imperfections may still be observed, such as extremely small, isolated gaps or occasional points exhibiting minor height discrepancies. These subtle deviations are inherent challenges when processing complex real-world LiDAR data, yet they do not significantly detract from the overall high quality and consistency of the DEMs produced by our framework.

4.4. Quantitative Accuracy Evaluation

In the evaluation of DEM quality, aside from qualitative visual analysis, accuracy is a critical metric. To comprehensively quantify the deviation between the predicted DEM and the ground truth, we adopted the following evaluation metrics: root mean square error (RMSE) and mean absolute error (MAE). Their mathematical formulations are expressed as follows:
  • RMSE:
    RMSE = 1 N i = 1 N z ^ i z i 2
    where z ^ i is the predicted elevation, and z i is the corresponding reference value.
  • MAE:
    MAE = 1 N i = 1 N z ^ i z i
    providing a measure of the average absolute error across the DEM.
Table 2 presents a quantitative accuracy assessment comparing our proposed method against traditional linear and nearest-neighbor interpolation techniques as well as methods combining filtering with interpolation across two distinct experimental scenarios. Accuracy was evaluated by comparing each grid cell of the reconstructed DEMs with the corresponding cell in a high-precision ground truth DEM.
In Experiment 1, which used data from steep slope terrain (Data1, 2, and 3), our proposed method consistently achieved the lowest error metrics. Across these datasets, the RMSE of our method ranged from 1.026 to 1.306 m, and the MAE from 0.706 to 0.946 m. In contrast, after filtering, methods such as PMF+Nearest, PMF+Linear, SMRF+Nearest, and SMRF+Linear produced notably higher errors than our proposed method. For instance, across Experiment 1 datasets, their RMSEs ranged from approximately 1.091 to 6.523 m, and their MAEs from 0.799 to 2.688 m. The RMSEs of our method were approximately 1.5 to 6 times lower, and the MAEs approximately 1.5 to 3.5 times lower, than those of these filtered-interpolation baselines, indicating its superior performance in steep slope environments.
In Experiment 2, which used data from undulating terrain (Data4, 5, and 6), the performance of our method was further validated. Our approach achieved RMSEs ranging from 1.560 to 2.855 m and MAEs from 0.686 to 0.978 m. In contrast, all filtered-interpolation baselines (PMF/SMRF with Linear/Nearest interpolation) consistently exhibited higher error magnitudes. For example, the RMSEs of PMF+Nearest ranged from 2.668 to 3.984 m, while those of SMRF+Nearest were notably higher, ranging from 6.057 to 9.283 m. These results confirm that, although filtering can reduce errors compared to basic interpolation, our end-to-end method maintains significantly higher accuracy across varying terrain complexities, even in relatively smoother undulating areas.
The significant difference in accuracy between our method and traditional interpolation approaches (including those combining PMF and SMRF filtering with linear or nearest-neighbor interpolation) arises from fundamental architectural advantages. Our pipeline directly generates the DEM from raw point clouds through an end-to-end learning process. It is designed to learn to implicitly filter non-ground points and, critically, reconstruct a high-precision DEM under multi-scale TPI supervision. In contrast, these filtering steps can improve ground point identification, the subsequent interpolation step simply connects these filtered points without the benefit of learned topographic patterns or multi-scale semantic constraints. By integrating end-to-end feature learning with TPI supervision, our method simultaneously processes non-ground noise, preserves critical terrain details, and ensures geomorphometric fidelity, resulting in its significantly improved accuracy.

4.5. Computational Efficiency Analysis

Table 3 presents a computational performance comparison evaluating the runtime efficiency of our proposed hybrid RNN-CNN framework against established DEM generation approaches across six diverse point cloud datasets. The analysis examines our method’s performance relative to PMF and SMRF variants, each combined with nearest-neighbor and linear interpolation techniques.
Our experimental results show that the proposed framework achieves faster processing times across all tested datasets, with performance improvements ranging from 11.3% to 51.7% compared to conventional approaches. This consistent computational advantage indicates the effectiveness of integrating multi-scale TPI features within the hybrid neural network architecture for point cloud-to-DEM conversion.
The performance patterns provide insights into the method’s computational characteristics. Improvements are more pronounced on smaller datasets, where the hybrid RNN-CNN pipeline can process point cloud data more efficiently. On larger datasets, the improvements remain consistent, suggesting that our approach scales appropriately across different point cloud densities and spatial extents.
Analysis across different baseline methods shows that our approach achieves notable advantages over SMRF-based techniques, with improvements typically exceeding 20%. The performance gains against PMF-based methods are generally more moderate but remain consistent, indicating that our multi-scale TPI feature integration approach addresses computational bottlenecks present in traditional morphological filtering workflows applied to point cloud data.
The computational efficiency stems from our hybrid architecture that processes point cloud data through a streamlined RNN-CNN pipeline, eliminating the need for explicit neighborhood search operations and complex geometric interpolation procedures that characterize traditional point cloud processing methods. The integration of multi-scale TPI features allows the network to capture terrain characteristics directly from the point cloud, reducing the computational overhead associated with iterative filtering and interpolation steps.
These performance improvements have practical implications for operational point cloud processing workflows. In applications requiring processing of large point cloud datasets, the consistent efficiency gains translate to meaningful time savings. The method’s reliable performance across diverse point cloud characteristics makes it suitable for automated DEM generation pipelines where consistent processing times are important for operational deployment.

5. Conclusions

This study introduces a lightweight and efficient framework for high-precision DEM reconstruction directly from LiDAR point clouds, effectively balancing computational demands with robust feature representation. Experimental results on Jiuzhaigou datasets demonstrate that our streamlined approach achieves superior accuracy (in terms of RMSE and MAE) compared to traditional methods. Furthermore, it exhibits superior computational efficiency, with a more than 20% on averageimprovement in processing speed. Our method particularly excels in complex, steep terrain, accurately capturing details such as ridges and valleys. The innovative integration of multi-scale TPI as a semantic constraint during training proved instrumental in ensuring the geomorphometric fidelity and realistic representation of terrain structures.
Despite these strengths, the method is not without limitations. Future work will focus on several improvements:
  • In data preprocessing, we plan to further refine noise suppression and outlier removal strategies to enhance reconstruction stability in extremely complex regions.
  • In model architecture, we will explore multi-scale fusion mechanisms to enhance feature representation across different spatial extents.
  • We will continue to optimize our lightweight network design and develop more efficient parallel computing schemes to further reduce computational complexity and improve large-scale data processing efficiency.
  • Crucially, acknowledging the application-driven nature of our research, future work will also involve a more comprehensive quantitative assessment of the usability of our generated DEMs in specific downstream geospatial applications. This includes, but is not limited to, integrating our reconstructed DEMs into hydrological modeling frameworks to evaluate their impact on derived parameters (e.g., flow accumulation, stream networks) and assessing their suitability for advanced landform classification tasks. Such evaluations will provide direct evidence of our method’s utility beyond merely producing accurate elevation surfaces, thereby demonstrating its full potential for real-world environmental and geomorphological analysis.
In summary, this study provides an efficient, robust, and computationally lightweight solution for high-precision DEM reconstruction from LiDAR point clouds, offering significant advantages for resource-constrained environments while maintaining high precision. The proposed framework not only addresses critical challenges in terrain modeling but also outlines clear directions for ongoing research to further enhance its accuracy, scalability, and practical applicability.

Author Contributions

Conceptualization, R.C.; Methodology, R.C. and B.G.; Validation, R.C.; Formal analysis, R.C. and B.G.; Investigation, R.C., J.W. and J.X.; Writing—original draft, R.C.; Visualization, R.C.; Supervision, C.Y. and H.M.; Project administration, C.Y.; Funding acquisition, C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, [Grant No. 2024YFC3810802], the [National Key R & D], grant number [2018YFB0504500], the [National Natural Science Foundation of China], grant number [41101417], and the [National High Resolution Earth Observations Foundation], grant number [11-H37B02-9001-19/22].

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and proprietary concerns.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. MacMillan, R.; Jones, R.K.; McNabb, D.H. Defining a hierarchy of spatial entities for environmental analysis and modeling using digital elevation models (DEMs). Comput. Environ. Urban Syst. 2004, 28, 175–200. [Google Scholar] [CrossRef]
  2. Yang, J.; Xu, J.; Lv, Y.; Zhou, C.; Zhu, Y.; Cheng, W. Deep learning-based automated terrain classification using high-resolution DEM data. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103249. [Google Scholar] [CrossRef]
  3. Siervo, V.; Pescatore, E.; Giano, S.I. Geomorphic analysis and semi-automated landforms extraction in different natural landscapes. Environ. Earth Sci. 2023, 82, 128. [Google Scholar] [CrossRef]
  4. Gallant, J.C.; Austin, J.M. Derivation of terrain covariates for digital soil mapping in Australia. Soil Res. 2015, 53, 895–906. [Google Scholar] [CrossRef]
  5. Armstrong, R.N.; Martz, L.W. Topographic parameterization in continental hydrology: A study in scale. Hydrol. Process. 2003, 17, 3763–3781. [Google Scholar] [CrossRef]
  6. Wood, S.W.; Murphy, B.P.; Bowman, D.M. Firescape ecology: How topography determines the contrasting distribution of fire and rain forest in the south-west of the Tasmanian Wilderness World Heritage Area. J. Biogeogr. 2011, 38, 1807–1820. [Google Scholar] [CrossRef]
  7. Bilodeau, M.F.; Esau, T.J.; Zaman, Q.U.; Heung, B.; Farooque, A.A. Enhancing surface drainage mapping in eastern Canada with deep learning applied to LiDAR-derived elevation data. Sci. Rep. 2024, 14, 10016. [Google Scholar] [CrossRef]
  8. Li, B.; Lu, H.; Wang, H.; Qi, J.; Yang, G.; Pang, Y.; Dong, H.; Lian, Y. Terrain-Net: A highly-efficient, parameter-free, and easy-to-use deep neural network for ground filtering of UAV LiDAR data in forested environments. Remote Sens. 2022, 14, 5798. [Google Scholar] [CrossRef]
  9. Li, W.; Hsu, C.-Y. Automated terrain feature identification from remote sensing imagery: A deep learning approach. Int. J. Geogr. Inf. Sci. 2020, 34, 637–660. [Google Scholar] [CrossRef]
  10. Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef]
  11. Heritage, G.L.; Milan, D.J.; Large, A.R.; Fuller, I.C. Influence of survey strategy and interpolation model on DEM quality. Geomorphology 2009, 112, 334–344. [Google Scholar] [CrossRef]
  12. Li, Y.; Yong, B.; Van Oosterom, P.; Lemmens, M.; Wu, H.; Ren, L.; Zheng, M.; Zhou, J. Airborne LiDAR data filtering based on geodesic transformations of mathematical morphology. Remote Sens. 2017, 9, 1104. [Google Scholar] [CrossRef]
  13. Vosselman, G. Slope based filtering of laser altimetry data. Int. Arch. Photogramm. Remote Sens. 2000, 33, 935–942. [Google Scholar]
  14. Lin, X.; Zhang, J. Segmentation-based filtering of airborne LiDAR point clouds by progressive densification of terrain segments. Remote Sens. 2014, 6, 1294–1326. [Google Scholar] [CrossRef]
  15. Zhang, K.; Chen, S.C.; Whitman, D.; Shyu, M.L.; Yan, J.; Zhang, C. A progressive morphological filter for removing nonground measurements from airborne LIDAR data. IEEE Trans. Geosci. Remote Sens. 2003, 41, 872–882. [Google Scholar] [CrossRef]
  16. Mo, Y.; Zhong, R.; Sun, H.; Wu, Q.; Du, L.; Geng, Y.; Cao, S. Integrated airborne LiDAR data and imagery for suburban land cover classification using machine learning methods. Sensors 2019, 19, 1996. [Google Scholar] [CrossRef] [PubMed]
  17. Veronesi, F.; Hurni, L. Random Forest with semantic tie points for classifying landforms and creating rigorous shaded relief representations. Geomorphology 2014, 224, 152–160. [Google Scholar] [CrossRef]
  18. Tan, Y.-C.; Duarte, L.; Teodoro, A.C. Comparative Study of Random Forest and Support Vector Machine for Land Cover Classification and Post-Wildfire Change Detection. Land 2024, 13, 1878. [Google Scholar] [CrossRef]
  19. Qian, Y.; Zhou, W.; Yan, J.; Li, W.; Han, L. Comparing Machine Learning Classifiers for Object-Based Land Cover Classification Using Very High Resolution Imagery. Remote Sens. 2015, 7, 153–168. [Google Scholar] [CrossRef]
  20. Hengl, T.; Rossiter, D.G. Supervised Landform Classification to Enhance and Replace Photo-Interpretation in Semi-Detailed Soil Survey. Soil Sci. Soc. Am. J. 2003, 67, 1810–1822. [Google Scholar] [CrossRef]
  21. Nikparvar, B.; Thill, J.-C. Machine Learning of Spatial Data. ISPRS Int. J. Geo-Inf. 2021, 10, 600. [Google Scholar] [CrossRef]
  22. Yuan, X.; Zhu, J.; Lei, H.; Peng, S.; Wang, W.; Li, X. Duplex-Hierarchy Representation Learning for Remote Sensing Image Classification. Sensors 2024, 24, 1130. [Google Scholar] [CrossRef]
  23. Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5560–5568. [Google Scholar] [CrossRef]
  24. Lolli, S. Machine learning techniques for vertical lidar-based detection, characterization, and classification of aerosols and clouds: A comprehensive survey. Remote Sens. 2023, 15, 4318. [Google Scholar] [CrossRef]
  25. Dong, F.; Jin, J.; Li, L.; Li, H.; Zhang, Y. A Multi-Scale Content-Structure Feature Extraction Network Applied to Gully Extraction. Remote Sens. 2024, 16, 3562. [Google Scholar] [CrossRef]
  26. Azmoon, B.; Biniyaz, A.; Liu, Z. Use of High-Resolution Multi-Temporal DEM Data for Landslide Detection. Geosciences 2022, 12, 378. [Google Scholar] [CrossRef]
  27. Janssens-Coron, E.; Guilbert, E. Ground point filtering from airborne lidar point clouds using deep learning: A preliminary study. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 1559–1565. [Google Scholar] [CrossRef]
  28. Zhao, G.; Cai, Z.; Wang, X.; Dang, X. GAN Data Augmentation Methods in Rock Classification. Appl. Sci. 2023, 13, 5316. [Google Scholar] [CrossRef]
  29. Magalhães, I.A.L.; de Carvalho Júnior, O.A.; de Carvalho, O.L.F.; de Albuquerque, A.O.; Hermuche, P.M.; Merino, É.R.; Gomes, R.A.T.; Guimarães, R.F. Comparing Machine and Deep Learning Methods for the Phenology-Based Classification of Land Cover Types in the Amazon Biome Using Sentinel-1 Time Series. Remote Sens. 2022, 14, 4858. [Google Scholar] [CrossRef]
  30. Zhao, X.; Su, Y.; Li, W.; Hu, T.; Liu, J.; Guo, Q. A Comparison of LiDAR Filtering Algorithms in Vegetated Mountain Areas. Can. J. Remote Sens. 2018, 44, 287–298. [Google Scholar] [CrossRef]
  31. Mohan, A.; Singh, A.K.; Kumar, B.; Dwivedi, R. Review on remote sensing methods for landslide detection using machine and deep learning. Trans. Emerg. Telecommun. Technol. 2021, 32, e3998. [Google Scholar] [CrossRef]
  32. Jin, S.; Su, Y.; Zhao, X.; Hu, T.; Guo, Q. A point-based fully convolutional neural network for airborne LiDAR ground point filtering in forested environments. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3958–3974. [Google Scholar] [CrossRef]
  33. Zhu, X.; Zhou, H.; Wang, T.; Hong, F.; Li, W.; Ma, Y.; Li, H.; Yang, R.; Lind, D. Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6807–6822. [Google Scholar] [CrossRef]
  34. Dong, S.; Chen, Z. A Multi-Level Feature Fusion Network for Remote Sensing Image Segmentation. Sensors 2021, 21, 1267. [Google Scholar] [CrossRef]
  35. Jiao, D.; Wang, D.; Lv, H.; Peng, Y. Super-resolution reconstruction of a digital elevation model based on a deep residual network. Open Geosci. 2020, 12, 1369–1382. [Google Scholar] [CrossRef]
  36. Lin, X.; Zhang, Q.; Wang, H.; Yao, C.; Chen, C.; Cheng, L.; Li, Z. A DEM super-resolution reconstruction network combining internal and external learning. Remote Sens. 2022, 14, 2181. [Google Scholar] [CrossRef]
  37. Xu, Z.; Chen, Z.; Yi, W.; Gui, Q.; Hou, W.; Ding, M. Deep gradient prior network for DEM super-resolution: Transfer learning from image to DEM. ISPRS J. Photogramm. Remote Sens. 2019, 150, 80–90. [Google Scholar] [CrossRef]
  38. Han, X.; Ma, X.; Li, H.; Chen, Z. A global-information-constrained deep learning network for digital elevation model super-resolution. Remote Sens. 2023, 15, 305. [Google Scholar] [CrossRef]
  39. Luo, Y.; Ma, H.; Zhou, L. DEM retrieval from airborne LiDAR point clouds in mountain areas via deep neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1770–1774. [Google Scholar] [CrossRef]
  40. Jenness, J. Topographic Position Index (tpi_jen. avx_extension for Arcview 3. x, v. 1.3 a, Jenness Enterprises [EB/OL]. 2006. Available online: http://www.jennessent.com/arcview/tpi.htm (accessed on 6 August 2025).
  41. Weiss, A. Topographic position and landforms analysis. In Proceedings of the Poster Presentation, ESRI User Conference, San Diego, CA, USA, 9–13 July 2001; Volume 200. [Google Scholar]
  42. Zhang, L.; Zhang, L. Artificial intelligence for remote sensing data analysis: A review of challenges and opportunities. IEEE Geosci. Remote Sens. Mag. 2022, 10, 270–294. [Google Scholar] [CrossRef]
Figure 1. Location and the DEM of the study area. This figure displays the Jiuzhaigou study area’s location and the DEM used for analysis.
Figure 1. Location and the DEM of the study area. This figure displays the Jiuzhaigou study area’s location and the DEM used for analysis.
Remotesensing 17 02770 g001
Figure 2. Illustration of the comprehensive workflow for generating a high-precision DEM from LiDAR point clouds.
Figure 2. Illustration of the comprehensive workflow for generating a high-precision DEM from LiDAR point clouds.
Remotesensing 17 02770 g002
Figure 3. Visual comparison of DEM reconstruction results across different terrain types. Data1, 2, and 3 represent slope terrain, while Data4, 5, and 6 represent undulating terrain. From left to right, columns illustrate (A) original point cloud, (B) our proposed method, (C) PMF + Nearest Neighbor interpolation, (D) PMF + Linear interpolation, (E) SMRF + Nearest Neighbor interpolation, and (F) SMRF + Linear interpolation, (G) ground truth DEMs.
Figure 3. Visual comparison of DEM reconstruction results across different terrain types. Data1, 2, and 3 represent slope terrain, while Data4, 5, and 6 represent undulating terrain. From left to right, columns illustrate (A) original point cloud, (B) our proposed method, (C) PMF + Nearest Neighbor interpolation, (D) PMF + Linear interpolation, (E) SMRF + Nearest Neighbor interpolation, and (F) SMRF + Linear interpolation, (G) ground truth DEMs.
Remotesensing 17 02770 g003
Figure 4. The detailed visualization of the high-resolution DEM generated through the application of our proposed methodological approach. This comprehensive visual representation clearly demonstrates the superior spatial accuracy and terrain reconstruction fidelity achieved by our innovative processing framework. The resulting DEM exhibits exceptional preservation of fine-scale topographic features while maintaining smooth surface continuity across complex terrain morphology.
Figure 4. The detailed visualization of the high-resolution DEM generated through the application of our proposed methodological approach. This comprehensive visual representation clearly demonstrates the superior spatial accuracy and terrain reconstruction fidelity achieved by our innovative processing framework. The resulting DEM exhibits exceptional preservation of fine-scale topographic features while maintaining smooth surface continuity across complex terrain morphology.
Remotesensing 17 02770 g004
Table 1. The details of the data in Jiuzhaigou.
Table 1. The details of the data in Jiuzhaigou.
Properties of the DataContents
Altitude of points2306–4306 (m)
Points density15 (pts/m2)
LiDAR scanner typeRiegl VG-1560i
Overlap of flight lines25%
Horizontal accuracy25–30 (cm)
Vertical accuracy15 (cm)
Flight platformCessna 208B aircraft
Table 2. Quantitative accuracy assessment of DEMs.
Table 2. Quantitative accuracy assessment of DEMs.
ExperimentDataMetricsOursPMF + NearestPMF + LinearSMRF + NearestSMRF + Linear
Experiment 1Data1RMSE (m)1.0581.0911.1301.1511.019
MAE (m)0.7060.7990.8170.8440.762
Data2RMSE (m)1.3062.4201.8045.9855.334
MAE (m)0.9461.4061.2522.3532.183
Data3RMSE (m)1.0262.9062.7466.5236.224
MAE (m)0.7861.4341.3862.6882.680
Experiment 2Data4RMSE (m)2.8553.2142.1259.2838.275
MAE (m)0.9781.3411.1492.9812.837
Data5RMSE (m)1.6402.6682.1446.0575.593
MAE (m)0.6861.4591.3582.6442.571
Data6RMSE (m)1.5603.9843.6268.2667.834
MAE (m)0.7251.4771.4223.1163.087
Table 3. Processing time analysis for different DEM generation approaches.
Table 3. Processing time analysis for different DEM generation approaches.
ExperimentDataOursPMF + NearestPMF + LinearSMRF + NearestSMRF + Linear
Experiment 1Data17.74 s13.22 s10.95 s16.01 s13.74 s
Data2213.81 s238.61 s234.83 s254.85 s247.63 s
Data3113.67 s142.16 s125.67 s148.81 s142.24 s
Experiment 2Data4202.63 s217.62 s218.51 s249.79 s236.87 s
Data5123.47 s142.85 s139.21 s167.06 s160.66 s
Data6160.32 s177.23 s177.31 s197.38 s192.52 s
Time measurements for the proposed method represent the end-to-end DEM reconstruction duration from raw LiDAR point clouds. For traditional baseline methods (PMF/SMRF + Linear/Nearest Neighbor), time measurements include both the ground filtering process and the subsequent interpolation step. All timings were recorded on the same hardware configuration.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, R.; Yao, C.; Ma, H.; Guo, B.; Wang, J.; Xu, J. A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction. Remote Sens. 2025, 17, 2770. https://doi.org/10.3390/rs17162770

AMA Style

Cao R, Yao C, Ma H, Guo B, Wang J, Xu J. A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction. Remote Sensing. 2025; 17(16):2770. https://doi.org/10.3390/rs17162770

Chicago/Turabian Style

Cao, Ruizhe, Chunjing Yao, Hongchao Ma, Bin Guo, Jie Wang, and Junhao Xu. 2025. "A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction" Remote Sensing 17, no. 16: 2770. https://doi.org/10.3390/rs17162770

APA Style

Cao, R., Yao, C., Ma, H., Guo, B., Wang, J., & Xu, J. (2025). A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction. Remote Sensing, 17(16), 2770. https://doi.org/10.3390/rs17162770

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop