1. Introduction
Digital Terrain Models (DTMs) are fundamental tools for analyzing and modeling Earth surface processes, supporting applications in hydrology, geomorphology, natural hazard assessment, and environmental planning [
1,
2,
3]. They provide raster-based, geo-referenced representations of the Earth’s surface elevation, capturing topographic variability essential for surface water modeling, landscape evolution, and risk analysis. High-resolution DTMs, typically derived from airborne LiDAR or UAV photogrammetry, offer detailed depictions of terrain morphology but are limited by high acquisition costs, complex logistics, and restricted spatial coverage, particularly in remote or densely vegetated areas [
4,
5,
6].
In contrast, global DTMs such as the Shuttle Radar Topography Mission (SRTM), ALOS World 3D, and Copernicus GLO-30 provide near-global coverage at spatial resolutions of 30 m or more, enabling regional to global-scale studies [
7,
8]. However, these coarser models often fail to represent fine-scale topographic details, especially in mountainous or urban environments, reducing their suitability for high-precision modeling tasks. This trade-off between spatial resolution and coverage has motivated the development of super-resolution (SR) methods that aim to computationally enhance the resolution of coarse DTMs, providing a cost-effective alternative for terrain refinement [
9,
10,
11,
12].
Super-resolution techniques, originally developed in the image-processing field, reconstruct high-resolution images from one or more low-resolution inputs [
13]. When applied to elevation data, the challenge lies in recovering fine-scale geomorphological structures from sparse inputs. Traditional SR approaches can be broadly classified into interpolation-based, reconstruction-based, and learning-based methods [
14,
15]. Interpolation approaches such as bilinear, bicubic, Kriging, or Inverse Distance Weighted (IDW) are simple and computationally efficient but tend to over smooth terrain features, failing to preserve ridges, cliffs, or slopes [
16,
17]. Reconstruction-based methods use gradient or edge constraints to improve surface detail, yet their effectiveness declines under high upscaling factors or in heterogeneous terrain [
18,
19].
Learning-based methods, especially deep learning (DL) approaches, have recently shown great promise by learning non-linear relationships between low-resolution and high-resolution elevation data. Early frameworks relied on manifold learning [
20], sparse coding [
21], or patch-based dictionary matching [
10], but these were computationally expensive. The introduction of convolutional neural networks (CNNs) enabled end-to-end training from paired datasets, and models such as Super-Resolution Convolutional Neural Network (SRCNN), Enhanced Deep Residual Networks (EDSR), Super-Resolution Generative Adversarial Network (SRGAN), Enhanced SRGAN (ESRGAN), and Residual Channel Attention Network (RCAN), originally designed for natural image enhancement, have been successfully adapted for DTM super-resolution [
22,
23,
24,
25,
26,
27]. Among these, the Residual Channel Attention Network (RCAN) is particularly effective due to its ability to recover high-frequency terrain details using residual blocks and channel-wise attention [
28].
Despite these advances, applying image-based models directly to elevation data presents unique challenges. DTMs represent continuous surfaces with geometric properties such as slope, aspect, and curvature, which are not typically considered in models trained on natural images. As a result, standard CNN-based methods may produce elevation inconsistencies, loss of geomorphological structure, or artifacts affecting drainage networks and valley boundaries [
29,
30]. Furthermore, deep learning models often generalize poorly across different terrain types and rarely incorporate topographic constraints or uncertainty quantification [
31,
32].
To address these limitations, recent studies have integrated terrain-specific descriptors such as slope, curvature, and roughness into model architectures or loss functions [
26,
33]. These terrain-aware strategies reinforce physical consistency and improve the realism of reconstructed surfaces. Multi-component loss functions combining pixel-wise accuracy (L1, L2), perceptual similarity (SSIM), and gradient-based terrain consistency have further enhanced both numerical precision and structural coherence [
25,
34]. Hybrid frameworks, such as detrending-based deep learning (DTDL), have also been proposed to separate large-scale elevation trends from high-frequency residuals, allowing more effective learning of fine-scale patterns [
35].
Beyond deep learning, probabilistic and geostatistical approaches have treated DTM super-resolution as a non-unique reconstruction problem, modeling multiple plausible fine-resolution surfaces consistent with the same coarse data [
9,
36]. Methods based on variograms [
37,
38] or Multiple-Point Statistics (MPS) using training images [
39,
40,
41] can reproduce spatial structures realistically but are computationally demanding and rely on suitable training data [
42,
43].
Despite substantial methodological progress, generating accurate high-resolution DTMs remains a challenge, especially in topographically complex regions such as high mountain Asia [
44], where global DTMs like SRTM or ASTER GDEM fail to capture sharp terrain discontinuities. Preserving key landform features such as ridgelines, drainage channels, and valley floors is critical for hydrological and environmental modeling [
45,
46]. Moreover, performance must be assessed not only through error metrics such as RMSE or MAE but also in terms of geomorphological realism and through the consistency of derived terrain parameters [
47,
48].
The present study proposes a deep learning-based super-resolution framework for DTM enhancement using the RCAN model, optimized through terrain-aware loss functions that incorporate domain-specific elevation and slope information. The research aims to (1) identify and evaluate the most effective loss function for terrain super-resolution and (2) fine-tune the corresponding weights to balance elevation accuracy and structural preservation. Using a 568 km2 LiDAR-derived dataset covering diverse terrain types, the model is trained to generate 1 m resolution DTMs from 10 m inputs. Model performance is evaluated through statistical and structural metrics across flat, hilly, and mountainous areas. The results demonstrate that combining deep learning with geomorphological informed loss functions provides a practical and scalable approach for producing high-resolution DTMs from widely available coarse datasets.
2. Materials and Methods
2.1. Overview of the Workflow
The proposed workflow is structured into two main phases. In the first phase, several custom loss functions were developed and assessed to determine which configuration best improves DTM reconstruction quality. In the second phase, the selected loss function was fine-tuned by adjusting the relative balance between its components to further optimize model performance. The backbone of the super-resolution framework is the Residual Channel Attention Network (RCAN), an advanced neural architecture designed to upscale spatial data while preserving fine-scale structural details. For this study, RCAN was specifically adapted for terrain data by incorporating loss components sensitive to elevation-dependent features such as slope gradients.
2.2. Dataset Description
The dataset used in this study consists of 568 high-resolution Digital Terrain Model (DTM) tiles freely provided by the Italian Ministry of the Environment and Energy Security (MASE) (
https://sim.mase.gov.it/portalediaccesso/mappe/#/viewer/new (accessed on 14 November 2024)) (
Figure 1). Each tile covers a 1 km × 1 km region and was originally generated from airborne LiDAR acquisitions at a spatial resolution of 1 m. The tiles span a wide range of morphological settings, including flat alluvial plains, rolling hills, and steep mountainous terrain, ensuring that the experimental framework captures diverse topographic conditions. All DTMs were downloaded in GeoTIFF format with complete geospatial metadata, allowing consistent spatial alignment and seamless integration into the processing and modeling pipeline.
2.3. Data Preprocessing
Prior to model training, each high-resolution DTM tile (1 m; 1000 × 1000 pixels) was downsampled to 10 m (100 × 100 pixels) using average aggregation to simulate realistic coarse-resolution DTM data while preserving the underlying terrain structure. The original georeferencing metadata was retained and propagated to both resolutions to ensure consistent spatial reference during evaluation. The resulting paired low- and high-resolution tiles were then divided into training, validation, and testing subsets so that model performance could be assessed on unseen terrain.
2.4. RCAN Model Architecture
The network used in this study builds on the Residual Channel Attention Network (RCAN), with several adaptations made to accommodate the characteristics of single-band Digital Terrain Models (DTMs) and the required 10× spatial upscaling (
Figure 2). The original RCAN organizes its feature extraction into multiple Residual Groups (RGs), each containing several Residual Channel Attention Blocks (RCABs). For the present work, this structure was simplified to a single sequence of ten RCABs. This adjustment reduces computational load while still allowing the network to extract detailed spatial information from elevation data.
Each RCAB retains the standard components of RCAN: two convolution layers with ReLU activation and a Squeeze-and-Excitation channel-attention module. All convolution layers use reflect padding rather than the zero padding adopted in the original implementation. Reflect padding helps reduce boundary artifacts that can otherwise appear along tile edges and propagate through the network—an important consideration when working with geospatial datasets.
A more substantial modification concerns the upsampling module. The classical RCAN uses pixel-shuffle layers designed for integer upscaling factors (e.g., 2× or 4×). Since the task here requires a non-integer global scaling factor of 10× (from 10 m to 1 m), a progressive interpolation strategy was introduced. The final upsampling unit consists of three stages: an initial 2.5× bilinear interpolation, followed by two 2× bilinear interpolation steps. Each stage is paired with a convolution layer and a LeakyReLU activation to refine intermediate representations. This approach avoids the checkerboard artifacts often associated with sub-pixel convolution when applied to non-integer scaling and provides more stable optimization during training.
The architecture was also adapted to the single-channel structure of DTMs by replacing the RGB-based input and output layers of the original model with one-channel convolution kernels. This ensures that the network handles elevation values consistently and preserves the metric nature of the data.
2.5. Custom Loss Functions
Let
Ŷ ∈ be the predicted DTM batch and
Y ∈ the ground truth. Index pixels by
p = (
i,
j). Let
N =
H ×
W be the per-sample pixel count. Define forward finite differences
2.5.3. LF3
Gaussian blur kernel size = 5, σ = 1.0, padding = reflect, number of pyramid levels L = 3, and downsampling via average pooling by factor 2 at each level.
Defone B(.) as Gaussian blur and D(.) (avg pool downsample) Laplacian residual at level k:
2.5.4. LF4
To adapt RCAN to the physical characteristics of terrain data, four loss formulations were designed and tested. Each loss integrates pixel-wise error terms with additional components intended to guide the model toward reproducing local slopes, multiscale structure, or elevation-dependent variations.
LF1 combines L1 and L2 errors with a slope-based term derived from horizontal and vertical finite differences. This formulation encourages both absolute accuracy and consistency in local terrain gradients.
LF2 pairs the L1 loss with an elevation-weighted term. The weighting factor assigns higher importance to elevations further from the mean level of each tile, encouraging the network to reduce bias in areas where height variations are more pronounced.
LF3 incorporates a Laplacian-pyramid representation, allowing errors to be evaluated at multiple spatial scales. The multiscale residuals emphasize terrain discontinuities and break lines, while an additional gradient term enforces local slope consistency.
LF4 combines the L1 loss with the slope-based gradient penalty. This simpler formulation targets elevation accuracy while explicitly constraining local gradients, making it well suited for preserving terrain geometry.
Each loss function was tested by training RCAN for 100 epochs using the same training and validation sets and the same 2.5× → 2× → 2× progressive upsampling module. Model selection relied on validation RMSE and SSIM. Among the four candidates, LF4 provided the most stable and accurate results, and was therefore examined further through a small grid search over the weighting parameters (α, γ), with the final configuration selected using the same validation-based procedure. These four configurations (LF1–LF4) constitute an ablation study designed to isolate the contribution of each loss component. Because all models share identical architecture and upsampling, the observed performance differences arise solely from the loss terms, allowing a direct assessment of elevation fidelity, slope consistency, and elevation-weighted components.
2.6. Training Setup and Implementation
All experiments were conducted using the PyTorch 1.0.0 framework on Google Colab (V3) equipped with a Tesla T4 GPU. The learning rate was fixed at 5 × 10−6, the batch size was set to 4, and random seeds were initialized for reproducibility. Model checkpoints were saved periodically, and the final model was selected based on the lowest validation RMSE across epochs.
2.7. Evaluation Metrics
Model performance was assessed on a per-tile basis using standard statistical and perceptual metrics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM). Visual assessments were also performed through signed error maps and residual histograms to identify spatial patterns of error, particularly in areas with steep slopes or complex morphology.
2.8. Use of Generative AI
Generative Artificial Intelligence (GenAI) tools, specifically ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), were used to assist in the language refinement and formatting of this manuscript. The AI was not used for generating data, designing experiments, analyzing results, or interpreting findings. All scientific content, methods, and conclusions were entirely developed and validated by the authors.
3. Results
3.1. Visual Assessment of Terrain Reconstruction
Figure 3 presents analytical hill shade visualizations for two representative test tiles, comparing the reference LiDAR-derived DTM (1 m), the super-resolved output (1 m), and the original input DTM (10 m). The hill-shaded images highlight clear differences in terrain representation across scales. Compared to the 10 m input, the super-resolved DTMs recover finer geomorphological features, including drainage lines, ridges, and subtle slope transitions, which are largely absent or heavily smoothed in the low-resolution data. While the super-resolved outputs do not fully reproduce all high frequency details present in the reference DTMs, they show a marked improvement in spatial continuity and terrain structure, visually aligning more closely with the reference surfaces.
3.2. Quantitative Evaluation of Loss Functions
To assess the effectiveness of the proposed terrain-aware loss formulations, four distinct loss functions (LF1–LF4) were implemented and tested within the RCAN-based DTM super-resolution framework. Each configuration was trained independently for 100 epochs under identical conditions, including model architecture, dataset, and training parameters, to ensure a consistent basis for comparison. The optimal checkpoint for each configuration was selected according to the minimum validation loss.
Model performance was quantitatively assessed using the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM). The averaged results across all test tiles are summarized in
Table 1. All loss functions were evaluated using the same progressive 2.5× → 2× → 2× upsampling module, meaning that the architectural contribution to error is constant across configurations. As a result, the RMSE values of the well-behaved loss functions fall within a relatively narrow range, reflecting the smoothing effect imposed by the interpolation-based upsampling and the corresponding error floor it introduces for recovering high-frequency terrain detail.
Within this shared architectural context, LF4—combining an L1 elevation term with a slope-consistency component—achieved the best overall performance. It obtained the lowest test RMSE (1.647 ± 0.490) and MAE (1.217 ± 0.382), along with a high SSIM (0.992 ± 0.004) and the highest PSNR (49.304 ± 3.493 dB), indicating strong generalization and improved preservation of both elevation fidelity and terrain structure (
Figure 4).
In contrast, LF2, which applies an elevation-weighted L1 penalty relative to deviation from the mean, produced by far the weakest results, with a test RMSE of 36.843 ± 13.991 and an MAE of 24.140 ± 9.442. This catastrophic drop in accuracy reflects a fundamental limitation of the weighting strategy: the exponential elevation-based weights heavily down-weight high-relief regions, causing the loss signal to collapse precisely where terrain variability is highest. As a result, the model receives almost no gradient information in slopes, ridges, and sharp discontinuities, leading to unstable optimization and highly inconsistent reconstruction behavior (
Figure 5).
The distribution of tile-wise statistics for all tested loss functions is illustrated in
Figure 6, highlighting that three of the loss formulations exhibit similar performance ranges, while LF4 demonstrates the most accurate and stable results across the dataset.
3.3. Loss Weight Tuning Analysis
To assess the influence of loss weighting on super-resolution performance, three configurations of the combined loss function were tested: (α = 0.8, γ = 0.2), (α = 0.5, γ = 0.5), and (α = 0.2, γ = 0.8), where α represents the weight of the elevation-based term and γ the slope-based term. The objective was to evaluate how emphasizing terrain smoothness versus gradient detail affects model accuracy and generalization.
Visual comparison of the three settings (
Figure 7,
Figure 8 and
Figure 9) showed minimal perceptible differences across sample tiles, confirming that quantitative assessment provides a more reliable evaluation. As summarized in
Table 2, the balanced configuration (α = 0.5, γ = 0.5) achieved the lowest RMSE on both validation (0.28) and testing datasets (1.62 ± 0.50), alongside high PSNR (49.47 ± 3.50 dB) and SSIM (0.99 ± 0.01). This configuration provided the best overall trade-off between elevation accuracy and terrain structure preservation.
Increasing the weight of the slope term (α = 0.2, γ = 0.8) led to a higher RMSE (1.73 ± 0.52) and MAE (1.32 ± 0.42), indicating reduced accuracy in absolute elevation, particularly in flat regions. Nonetheless, the SSIM remained consistently high (~0.99), reflecting that the overall terrain morphology was well preserved. Conversely, favoring the elevation term (α = 0.8, γ = 0.2) slightly improved RMSE over the slope-heavy variant but did not outperform the balanced configuration.
Per-tile statistics of the test dataset (
Figure 10) reveal a steady decline in accuracy as the slope component weight increases. This trend indicates that while slope information enhances structural realism, excessive weighting can compromise elevation precision. Overall, the balanced weighting (α = 0.5, γ = 0.5) demonstrated the most accurate and consistent results across terrain types.
3.4. Terrain-Specific Performance
3.4.1. Loss Functions
To assess how terrain morphology influences super-resolution accuracy, a stratified evaluation was conducted across three terrain classes—flat, hilly, and mountainous. Each test tile was categorized using slope-based masks derived from the ground truth DTM, and performance metrics were computed for each class. The evaluated metrics included Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Bias (mean signed error). This stratified approach enabled quantification of model performance under increasing topographic complexity.
The results, illustrated in
Figure 11,
Figure 12 and
Figure 13 and summarized in
Table 3, show that MAE values increase with terrain ruggedness. The LF4 model achieved the lowest MAE overall, with approximately 0.85 m in flat and hilly regions and 1.32 m in mountainous areas. LF3 followed closely with similar values—0.90 m (flat), 0.91 m (hilly), and 1.33 m (mountainous)—demonstrating robust generalization. LF1 produced slightly higher MAE (1.21–1.52 m), while LF2 performed substantially worse, particularly in flat terrain, where MAE reached 30.14 m.
RMSE results confirmed these trends. LF4 yielded the lowest RMSE in flat terrain (1.15 m) and remained under 2 m in mountainous areas (1.75 m). LF3 showed comparable performance with RMSE values up to 1.76 m, while LF1 ranged between 1.62 m and 1.99 m. LF2 again showed extreme errors, with RMSE peaking at 40.85 m in flat terrain.
Bias analysis revealed systematic patterns in elevation estimation. LF3 presented minimal bias across all terrain types (+0.26 m in mountainous areas), indicating strong balance and stability. LF1 consistently exhibited a positive bias (e.g., +0.82 m in flat regions), suggesting a slight overestimation trend. Conversely, LF4 showed a mild negative bias (−0.47 m in flat regions), indicating slight underestimation likely tied to its slope emphasis. LF2 displayed large positive biases in all regions (+29.76 m in flat terrain), confirming its instability.
3.4.2. Fine Tuning
To evaluate the performance of super-resolution models under different fine-tuning configurations for the LF4 loss function, the same accuracy metrics—MAE, RMSE, and Bias—were computed across flat, hilly, and mountainous regions. Results are summarized in
Table 4 and visualized in
Figure 14,
Figure 15 and
Figure 16.
MAE served as a key indicator of average elevation deviation. Across all terrain types, the lowest MAE values were consistently achieved in flat and hilly regions, where several fine-tuned models reached sub-meter accuracy. The best results were recorded in flat areas, with MAE as low as 0.83 m, indicating that fine-tuned LF4 configurations can closely approximate true elevation in homogeneous terrain. In contrast, mountainous regions exhibited higher MAE values, reaching up to 1.41 m, particularly when slope-based components dominated. Configurations balancing slope and elevation components achieved improved performance, with MAE around 1.29 m, suggesting better generalization in complex topography.
RMSE results mirrored the MAE trends. The lowest RMSE values appeared in flat and hilly areas (1.14–1.17 m), while mountainous regions showed higher errors (up to 1.84 m). RMSE proved most stable when slope and elevation weights were balanced, whereas disproportionate weighting toward either component led to greater variability and larger outlier errors. Balanced configurations not only reduced RMSE but also ensured consistent performance across all terrain classes.
Bias analysis revealed distinct systematic tendencies. Configurations emphasizing elevation accuracy exhibited negative bias (e.g., −0.47 m in flat regions), indicating mild underestimation of elevation. This pattern remained stable across terrains, suggesting that elevation-weighted (L1-dominant) losses compress elevation ranges. Conversely, slope-dominant configurations showed positive bias, particularly in flat areas (+0.83 m), reflecting overestimation due to excessive focus on gradient features. Balanced models maintained near-zero bias across all terrain types, achieving well-centered elevation predictions with minimal directional error.
3.5. Training Behavior and Convergence
3.5.1. Training Behavior of Different Loss Function
To assess the learning dynamics of the super-resolution model, the training loss was monitored over 100 epochs for each loss function (LF1–LF4).
Figure 17 illustrates the training loss curves, which provide insight into convergence rate, stability, and overall learning behavior.
The results reveal distinct convergence characteristics among the four loss functions. LF1 demonstrated a rapid decline in loss during the initial epochs, stabilizing by approximately epoch 15 with a final loss value near 0.10. This pattern indicates efficient and consistent learning with no observable overfitting. LF2 exhibited a similar early trajectory, starting from a slightly lower initial loss and converging marginally faster to 0.06–0.07. Gentle oscillations in the later epochs suggest sustained fine-tuning, likely influenced by its elevation-weighted structure.
In contrast, LF3 showed a slower, more gradual convergence pattern, leveling off around epoch 30 at approximately 0.10. This behavior is consistent with the Laplacian-based formulation, which prioritizes structural preservation and evolves cautiously during optimization. LF4, which combines L1 and slope terms, displayed a steep initial loss reduction followed by a smooth, stable decline, reaching final values of 0.11–0.12. Despite converging to slightly higher loss levels, the curve remained notably stable, reflecting effective balance between accuracy and terrain-structure awareness.
3.5.2. Training Behavior Under Different Fine-Tuning Weights
To further optimize the model’s capacity to preserve both elevation accuracy and terrain structure, the selected composite loss (LF4) was fine-tuned by varying the relative weights of its elevation (α) and slope (γ) components. The three configurations tested were (α = 0.8, γ = 0.2), (α = 0.5, γ = 0.5), and (α = 0.2, γ = 0.8). The corresponding training loss curves are shown in
Figure 18 and provide insight into how weighting affects convergence dynamics.
The elevation-dominant configuration (α = 0.8, γ = 0.2) exhibited a rapid decline in training loss during the first few epochs, stabilizing around epoch 20 at a final value near 0.10. This behavior indicates strong convergence toward minimizing elevation errors. However, this configuration’s weaker slope constraint may lead to reduced capability in preserving fine terrain structure in complex areas.
In contrast, the balanced configuration (α = 0.5, γ = 0.5) showed slower but steadier convergence, reaching stability around epoch 30 and attaining the lowest final loss value of approximately 0.09. The smooth and consistent curve suggests an effective balance between minimizing elevation discrepancies and maintaining terrain morphology.
The slope-dominant configuration (α = 0.2, γ = 0.8) converged more gradually, with the training loss flattening only after epoch 40 and reaching a slightly higher final value of about 0.12. Although this configuration enhances slope representation, especially in rugged regions, it converges more slowly and exhibits somewhat higher residual error in elevation prediction. Nevertheless, the absence of oscillations in all three cases confirms stable optimization throughout training.
4. Discussion
4.1. Visual Assessment of Terrain Reconstruction
The hill shade indicates that the proposed model enhances the visual interpretability of terrain by restoring coherent geomorphological patterns rather than merely increasing pixel density. Some smoothing is still evident in the super-resolved DTMs, particularly in steep or highly dissected areas, suggesting that certain fine-scale features remain challenging to reconstruct from coarse inputs. This behavior is consistent with the inherent information loss at 10 m resolution and reflects a trade-off between noise suppression and detail recovery.
4.2. Quantitative Evaluation of Loss Functions
The quantitative evaluation demonstrated that incorporating terrain-aware information into the loss function significantly enhances DTM super-resolution performance. Among the four tested formulations, the elevation-gradient loss (LF4) consistently outperformed the others across all key evaluation metrics, confirming the effectiveness of explicitly embedding slope information in the training process.
The superior performance of LF4 can be attributed to its balanced design, which simultaneously minimizes elevation discrepancies (through the L1 term) and preserves terrain morphology (via the slope-consistency term). This dual emphasis allows the model to maintain both vertical accuracy and horizontal structure, resulting in more realistic topographic reconstructions. In contrast, LF2’s elevation-weighted approach introduced instability, suggesting that dynamically adjusting pixel importance based on elevation deviation may amplify noise or lead to overfitting, particularly in flat or uniform regions.
LF1 and LF3, while performing adequately, highlighted different strengths. LF1’s composite structure improved general accuracy, but its reliance on multiple competing terms limited its ability to preserve sharp geomorphological transitions. LF3’s Laplacian pyramid formulation better retained high-frequency details such as ridges and break lines, but its multi-scale weighting did not fully capture terrain continuity across slopes. The comparative results emphasize that simplicity and physical relevance in loss design—rather than mathematical complexity—yield better generalization for DTM enhancement tasks.
4.3. Loss Weight Tuning Analysis
The loss-weight tuning analysis highlights the critical role of maintaining equilibrium between elevation fidelity and slope consistency in terrain super-resolution. The superior performance of the balanced configuration (α = 0.5, γ = 0.5) suggests that both elevation and gradient cues contribute complementary information: elevation terms ensure numerical stability, while slope terms preserve geomorphological continuity.
Overemphasizing the slope component (α = 0.2, γ = 0.8) introduces excessive sensitivity to local gradients, which may amplify noise and degrade accuracy in smoother landscapes. In contrast, relying too heavily on elevation alone (α = 0.8, γ = 0.2) limits the model’s ability to recover fine-scale structural detail, leading to smoother but less realistic reconstructions. The observed trends confirm that an intermediate weighting allows the model to generalize better across diverse terrain types by integrating both absolute and relational elevation information.
4.4. Terrain-Specific Performance
4.4.1. Loss Functions
The stratified terrain-based evaluation provides important insights into how topographic complexity affects the performance of different loss formulations. Both LF3 and LF4 performed robustly across all terrain classes, confirming that either integrating structural awareness, through multi-scale Laplacian components (LF3) or slope consistency (LF4), significantly improves the model’s capacity to generalize beyond flat surfaces.
LF3’s minimal bias and strong stability across varying relief suggest that its Laplacian pyramid component enhances sensitivity to multi-scale features while maintaining global elevation balance. This behavior is particularly beneficial for mountainous terrain, where abrupt elevation transitions can otherwise induce local distortions. LF4, although slightly biased toward underestimation, achieved the best overall numerical accuracy, indicating that slope-based constraints effectively preserve terrain geometry while limiting extreme errors.
In contrast, LF1, while consistent, exhibited a persistent positive bias, likely due to the dominance of pixel-wise losses (L1/L2) that do not explicitly encode structural gradients. Such models may produce smoother but slightly elevated terrain surfaces, potentially reducing hydrological realism. LF2’s poor performance across all metrics highlights the limitations of elevation-weighted schemes that fail to integrate geometric regularization, resulting in large systematic offsets and reduced predictive reliability.
Most of the residuals reported in
Table 3 are expected and typically occur in locations with abrupt high-frequency terrain changes, where fine-scale elevation discontinuities are difficult to fully reconstruct from 10 m inputs. These effects are well-known in DEM super-resolution and do not alter the overall performance trends.
4.4.2. Fine Tuning
The fine-tuning experiments for the LF4 loss function further highlight the critical role of balanced loss weighting in DTM super-resolution. The observed trends confirm that overemphasis on either slope or elevation terms compromises model generalization, particularly when transferring across terrains of varying relief.
In flat and hilly terrains, where gradients are gentle, slope-heavy models tend to over fit to minor elevation variations, resulting in artificial exaggeration of relief and positive bias. Conversely, elevation-dominant models underestimate heights, indicating a compression effect that smooths the terrain excessively. These systematic biases illustrate the inherent trade-off between preserving local geometry and maintaining global elevation accuracy.
The balanced configuration, however, mitigates these effects by jointly optimizing for structural integrity and elevation fidelity. The near-zero bias and sub-meter MAE observed across all terrain types indicate that the equal weighting scheme provides stable, terrain-independent performance. This balance allows the model to maintain sharp topographic features without amplifying noise or introducing elevation drift.
In mountainous terrain, where abrupt elevation changes and steep gradients dominate, even balanced models face increased RMSE due to the complexity of fine-scale relief. Nonetheless, their superior consistency across classes demonstrates robust generalization and a clear advantage for operational applications requiring uniform performance.
4.5. Training Behavior and Convergence
4.5.1. Training Behavior of Different Loss Function
The analysis of training dynamics highlights how loss design directly influences model convergence, learning stability, and overall optimization efficiency. The rapid and smooth convergence of LF1 and LF2 suggests that these formulations facilitate efficient gradient propagation, allowing the model to reach low loss values early in training. However, the slightly fluctuating pattern of LF2 indicates a potential trade-off: while its elevation-weighted loss accelerates learning, it may also introduce sensitivity to elevation range variability, potentially affecting stability across diverse terrains.
The gradual convergence of LF3 reflects the intrinsic behavior of Laplacian-based loss functions, which emphasize structural refinement and penalize abrupt spatial discrepancies. This slower learning pace is advantageous for preserving fine-scale terrain morphology, even though it delays reaching minimum loss values.
Meanwhile, LF4 achieves a desirable compromise between learning speed and stability. Its training curve demonstrates controlled, consistent convergence without oscillations or divergence—an indicator of robust optimization. The slightly higher final loss value does not necessarily imply inferior performance; rather, it reflects a more conservative adjustment process, which prioritizes slope and gradient consistency alongside elevation accuracy.
4.5.2. Training Behavior Under Different Fine-Tuning Weights
The convergence trends across the three fine-tuning configurations highlight how loss weighting directly governs learning behavior and optimization stability in DTM super-resolution. The rapid convergence observed in the elevation-heavy configuration (α = 0.8, γ = 0.2) indicates efficient gradient flow when elevation accuracy dominates the training objective. However, this setup tends to prioritize vertical precision over geomorphological realism, which may reduce structural fidelity in complex terrains.
Conversely, the slope-heavy configuration (α = 0.2, γ = 0.8) emphasizes surface gradients and morphological features but does so at the expense of absolute elevation accuracy and convergence speed. The slower training observed for this setup suggests that learning detailed slope patterns requires more epochs to stabilize, particularly in less variable regions such as plains.
The balanced configuration (α = 0.5, γ = 0.5) emerges as the most stable and efficient compromise. It maintains smooth convergence, achieves the lowest final training loss, and effectively integrates both elevation and slope learning objectives. This equilibrium allows the model to preserve fine-scale morphological detail without introducing significant elevation bias or instability.
The results also point toward several avenues in which the current framework could be extended. The present study focuses on elevation-aware loss functions that encode slope information to preserve geomorphic structure, but the architecture could be expanded into a multi-task formulation. Predicting elevation and slope jointly, rather than deriving slope as a secondary product, may strengthen the physical consistency of the output and reduce structural drift in steep or heterogeneous terrain. A second direction concerns model confidence. Terrain super-resolution models often behave differently across landforms, and an explicit estimate of uncertainty—whether through dropout sampling, ensemble strategies, or other probabilistic methods—would help quantify how the model reacts in areas dominated by sharp breaks or smooth plains. These ideas do not alter the core findings but highlight where the approach could evolve as higher-resolution training material becomes available.
5. Conclusions
This study introduced a deep learning-based framework for the super-resolution of Digital Terrain Models (DTMs) using a modified Residual Channel Attention Network (RCAN) architecture. The model successfully upscaled 10 m input DTMs to 1 m resolution and was trained and evaluated on a LiDAR-derived dataset covering diverse Italian landscapes, including flat, hilly, and mountainous regions. Central to the methodology was the design and optimization of elevation-aware loss functions, combining absolute elevation accuracy (L1) with slope-preserving terms to improve terrain realism and precision.
Experimental results demonstrated that a balanced loss configuration—equally weighting elevation and slope components—provided the most robust and generalizable performance. This setting achieved the lowest RMSE and MAE values across all terrain classes, while maintaining a bias close to zero, indicating well-centered predictions without systematic over- or underestimation. Although the achieved accuracy remains slightly above the nominal precision of the LiDAR reference data, the improvements are substantial, especially in complex terrains. Mountainous regions presented higher variability and error magnitudes, yet the model effectively preserved key geomorphological features, confirming its ability to retain structural integrity even in high-relief conditions.
Analysis of training behavior further revealed that Laplacian- and slope-aware losses enhanced structural learning at the cost of slower convergence, while simpler elevation-based losses converged faster but with reduced morphological fidelity. The study also highlights that models biased toward either elevation or slope accuracy alone tend to introduce directional prediction errors, whereas balanced configurations maintain both accuracy and stability.
Despite its strong performance, the framework’s generalization to other regions, resolutions, or input sources such as SRTM or ALOS remains to be validated. Additionally, the computational cost and the need for multiple training seeds to ensure stability may constrain large-scale or real-time applications. Nonetheless, the model’s lightweight inference and demonstrated reliability position it as a promising tool for geomorphology, hydrology, and landscape analysis, particularly in areas lacking LiDAR coverage.
Beyond the limitations identified here, several technical directions could strengthen the physical consistency and reliability of future terrain super-resolution frameworks. A multi-task formulation—where the network jointly predicts elevation and slope instead of deriving slope as a secondary product—may help stabilize geometric continuity in areas with abrupt gradients. Likewise, incorporating uncertainty estimation, such as ensemble predictions or dropout-based sampling, would offer insight into where the model is confident and where its outputs require caution. These additions do not alter the main conclusions of this study but point toward meaningful extensions for improving model robustness across diverse geomorphological settings.
Future research will focus on extending the model’s transferability to diverse terrains and sensor inputs, incorporating uncertainty quantification, and refining its performance toward near–LiDAR-level precision. With continued optimization, this framework has strong potential to become an operational approach for enhancing elevation data quality across a wide range of environmental applications.
6. Limitations and Practical Implications
The super-resolution model developed in this study demonstrates strong potential for enhancing Digital Terrain Models (DTMs) from coarse to fine resolution; however, several limitations constrain its broader applicability. The model was trained exclusively on LiDAR-derived DTMs from selected regions of Italy, covering terrain types such as flat plains, hilly areas, and mountainous zones. Consequently, its generalization to other geographic contexts or data sources, including global DTMs such as SRTM or ALOS World 3D, remains unconfirmed; however, we are currently exploring the application of our model to these datasets, aiming to assess its transferability and performance in a wider geographic context. The model may also exhibit reduced reliability in terrain types absent from the training dataset, such as volcanic fields, dune systems, densely built-up environments, or periglacial areas, where elevation patterns differ significantly from those encountered during training. Extending the training dataset to include morphologies that are more diverse such as volcanic fields, dune systems, densely built-up environment, or periglacial environments would likely improve its adaptability to these terrains.
Furthermore, while the framework effectively supports a 10× upscaling factor (10 m–1 m), extending its use to coarser datasets (e.g., 30 m–3 m) would likely require architectural modifications, retraining, or fine-tuning to preserve both elevation accuracy and structural integrity. A critical limitation lies in the model’s dependence on high-quality ground truth data. LiDAR, although precise, is costly, geographically limited, and computationally demanding to process, restricting the scalability of this approach to larger or less surveyed regions.
Training the RCAN-based model also proved computationally intensive, with convergence achieved only after 100 epochs across multiple loss configurations and random seeds. The combination of large tile sizes and the 10× upsampling ratio further increased GPU memory and processing requirements, posing challenges for large-scale or repeated applications. Additionally, comparisons with other state-of-the-art super-resolution models, such as SRGAN, EDSR, or recent Transformer-based architectures, were not performed due to their limited upsampling capabilities relative to RCAN. Including such comparisons represents a valuable future direction for a more comprehensive evaluation of the model’s performance.
Despite these constraints, the proposed framework shows strong potential for operational use in fields such as geomorphology, hydrology, and terrain analysis, particularly where LiDAR data are unavailable. Applications may include drainage network extraction, slope stability assessment, and landform characterization, where enhanced elevation precision can substantially improve analytical outcomes. For practical deployment, it is essential to validate the model across diverse environmental conditions and data sources, ensuring robustness and adaptability. Future work should also incorporate uncertainty quantification mechanisms to identify potential quality issues and guide users in interpreting model outputs confidently. By addressing these aspects, the framework could evolve into a reliable and transferable tool for global-scale terrain enhancement and geospatial analysis.