Comparison of DEM Super-Resolution Methods Based on Interpolation and Neural Networks

High-resolution digital elevation models (DEMs) play a critical role in geospatial databases, which can be applied to many terrain-related studies such as facility siting, hydrological analysis, and urban design. However, due to the limitation of precision of equipment, there are big gaps to collect high-resolution DEM data. A practical idea is to recover high-resolution DEMs from easily obtained low-resolution DEMs, and this process is termed DEM super-resolution (SR). However, traditional DEM SR methods (e.g., bicubic interpolation) tend to over-smooth high-frequency regions on account of the operation of averaging local variations. With the recent development of machine learning, image SR methods have made great progress. Nevertheless, due to the complexity of terrain characters (e.g., peak and valley) and the huge difference between elevation field and image RGB (Red, Green, and Blue) value field, there are few works that apply image SR methods to the task of DEM SR. Therefore, this paper investigates the question of whether the state-of-the-art image SR methods are appropriate for DEM SR. More specifically, the traditional interpolation method and three excellent SR methods based on neural networks are chosen for comparison. Experimental results suggest that SRGAN (Super-Resolution with Generative Adversarial Network) presents the best performance on accuracy evaluation over a series of DEM SR experiments.


Introduction
As one of the most important digital representations of terrain, DEMs record spatial elevation information in a regular raster form [1]. Through visualizing fluctuating characters of terrain surfaces, DEMs can be widely applied in the domains including facility siting, hydrological analysis, and urban design [2][3][4]. With the rapid development of measuring equipment, DEM data can be generated from various sources, which accelerate its universality in landform analysis applications [5]. Specifically, the synthetic aperture radar (SAR) has been used as the primary source of DEMs at a global scale for its robustness in all weather conditions [6]. Despite the wide usage of SAR data, the limitation of equipment precision can still result in systematic errors that reduce the resolutions of DEM products. The inadequate spatial resolution of DEM data also restricts its usage in terrain-related analyses [7,8]. For example, using recent and accurate topographic data rather than lowresolution DEM data can obtain better accuracy for flood inundation modeling [9]. In addition, higher-resolution DEM is able to return more accurate predicted terrain features such as stream network and sub-basin classification [10]. The most direct solution to obtain high-resolution DEM is to improve the precision of measuring equipment, but this process is difficult, costly, and time-consuming. Therefore, generating high-resolution DEMs without extra cost becomes a key concern of researchers from various fields [11,12].
Traditionally, a practical way to obtain high-resolution DEMs without extra cost is to recover from easily obtained low-resolution DEMs, and this process is termed DEM which are usually classified into three categories: interpolation-based, reconstruction-based, and learning-based [11]. Firstly, interpolation-based methods improve the resolution of low-resolution images by estimating values of unknown pixels in target high-resolution images [31]. In this process, the interpolation kernel (or function) is the key concern; given a specific interpolation kernel, values of unknown pixels can be calculated from their neighboring pixels. However, although interpolation-based methods are less time consuming, they tend to blur high-frequency regions during the estimation. Then, to tackle this problem, reconstruction-based methods integrate image prior knowledge to generate high-resolution images such as gradient profile prior [32] and natural image prior [33]. Therefore, compared with the interpolation-based methods, the reconstruction-based methods have the advantages of preserving edges and suppressing artifacts. However, this type of method is not suitable to produce high-resolution images at relatively large magnification factors. Finally, considering the learning-based strategies, the relevant methods attempt to construct the direct relations between high-resolution images and low-resolution images. For example, inspired by the idea of compress sensing, Yang et al. [19] proposed to jointly learn two dictionaries for both high-resolution images and low-resolution images. In this way, high-resolution images can be reconstructed by the sparse codes of low-resolution images and the corresponding high-resolution dictionary. Although learning-based methods can gather high-frequency information from a learning process, the quality of training data will have a great impact on results [34].
DEM SR techniques developed along with the road of image SR, and the three types of image SR methods can be also applied to DEM SR [35,36]. However, similar dilemmas of these methods in image SR have also appeared in DEM SR. Then, researchers attempt to solve this problem from other perspectives, and the development of neural networks brings a new direction. For example, Dong et al. [21] introduced convolutional neural network (CNN) for image SR, which learns an end-to-end mapping between low-resolution images and high-resolution images. In this method, a lightweight but effective network structure is adopted, which shows better overall image quality. Then, presenting the potential to solve undetermined problems, generative adversarial networks (GANs) have also been applied to image SR. For example, Ledig et al. [17] proposed a single image SR method based on a generative adversarial network (SRGAN), which obtained effective results on the perceptual feelings of images. Then, by improving the network structure of SRGAN, Wang et al. [18] proposed ESRGAN to further enhance the visual quality of generated images. Based on the performance of image SR methods, some researchers have also applied these methods to DEM data [34]. However, the recent image SR methods based on neural networks mainly focus on the visual quality of high-resolution images, while the focus of DEM SR is more concerned with the accuracy and terrain features of generated high-resolution DEM. Therefore, this paper aims to investigate whether image SR methods with the target of visual quality are suitable for the task of DEM SR. More specifically, three excellent SR methods based on neural networks (including SRGAN, ESRGAN, and CEDGAN) and the commonly used interpolation method (bicubic interpolation) are chosen for comparison. Details of these techniques are presented in Section 3.

SR Methods Based on Interpolation
Interpolation SR methods generate high-resolution images by estimating unknown pixel values in the high-resolution grids [31,37]. In this process, the value of an unknown pixel is calculated by its neighboring pixels based on a specific interpolation kernel. The commonly used interpolation methods include nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation, among which the bicubic interpolation obtains the best accuracy [38]. Therefore, in this paper, bicubic interpolation is chosen for comparison, and this interpolation process is presented in Figure 1. In the bicubic interpolation, values of unknown pixels in the high-resolution grids (e.g., p(x,y)) are estimated based on their nearest 16 pixels, and the formula is presented in Figure 1c. From this formula, high-frequency details will be easily smoothed for estimating unknown pixel values by the weighted sum of neighboring pixel values. Moreover, the accuracy of the interpolation results depends on the interpolation kernel W. Therefore, for valid comparisons, a commonly used interpolation kernel is chosen in this paper, which is described as Equation (1).
where t denotes distances between unknown pixels (e.g., p(x,y)) and existing pixels, and a is a given parameter, which is usually set as −0.5. interpolation, and bicubic interpolation, among which the bicubic interpolation obtains the best accuracy [38]. Therefore, in this paper, bicubic interpolation is chosen for comparison, and this interpolation process is presented in Figure 1. In the bicubic interpolation, values of unknown pixels in the high-resolution grids (e.g., p(x,y)) are estimated based on their nearest 16 pixels, and the formula is presented in Figure 1c. From this formula, high-frequency details will be easily smoothed for estimating unknown pixel values by the weighted sum of neighboring pixel values. Moreover, the accuracy of the interpolation results depends on the interpolation kernel W. Therefore, for valid comparisons, a commonly used interpolation kernel is chosen in this paper, which is described as Equation (1).
where denotes distances between unknown pixels (e.g., p(x,y)) and existing pixels, and a is a given parameter, which is usually set as −0.5.

SR Methods Based on Neural Networks
Recently, neural networks have been introduced to various tasks of computer vision including image segmentation, image classification, and image SR [39][40][41]. Compared with traditional image SR methods (e.g., sparse reconstruction), neural network SR methods can learn an end-to-end mapping between high-resolution (HR) and low-resolution (LR) images, and complex features can be automatically learned by hidden layers. Therefore, the pattern of hidden layers is critical to final results, for which most concerns have been paid to the structure design of neural networks. Among the neural network structures, generative adversarial networks (GANs) adopt a unique adversarial structure, which is suggested to have the advantage of sidestepping the difficulty of approximating many intractable probabilistic computations [42]. More specifically, there are two independent neural networks in GAN, which include a generator (G) and a discriminator (D). During the training process, the discriminator is trained to make the best judgment, while the generator is trained to maximally confuse the discriminator. When applied to image SR, the discriminator is trained to distinguish whether an image is real or generated by the generator, while the generator is trained to generate fake realistic images that the discriminator cannot distinguish. Therefore, with such a training strategy, when a training balance is reached, the generator can automatically learn to generate high-resolution images that are highly similar to the real images. This training process can be described as Equation (2).

SR Methods Based on Neural Networks
Recently, neural networks have been introduced to various tasks of computer vision including image segmentation, image classification, and image SR [39][40][41]. Compared with traditional image SR methods (e.g., sparse reconstruction), neural network SR methods can learn an end-to-end mapping between high-resolution (HR) and low-resolution (LR) images, and complex features can be automatically learned by hidden layers. Therefore, the pattern of hidden layers is critical to final results, for which most concerns have been paid to the structure design of neural networks. Among the neural network structures, generative adversarial networks (GANs) adopt a unique adversarial structure, which is suggested to have the advantage of sidestepping the difficulty of approximating many intractable probabilistic computations [42]. More specifically, there are two independent neural networks in GAN, which include a generator (G) and a discriminator (D). During the training process, the discriminator is trained to make the best judgment, while the generator is trained to maximally confuse the discriminator. When applied to image SR, the discriminator is trained to distinguish whether an image is real or generated by the generator, while the generator is trained to generate fake realistic images that the discriminator cannot distinguish. Therefore, with such a training strategy, when a training balance is reached, the generator can automatically learn to generate high-resolution images that are highly similar to the real images. This training process can be described as Equation (2).
where log D θ D I HR denotes that the discriminator (D) can correctly recognize real highresolution images (I HR ), and log(1 − D θ D G θ G (I LR ) denotes that the discriminator (D) can correctly recognize fake images generated by the generator (G θ G (I LR )). A general framework of this training process is presented in Figure 2. The adversarial loss is obtained from the discriminator, which can be automatically calculated during the process of adversarial training. This type of loss function is also a unique characteristic of GAN, which means that GAN can be trained without other specifically designed loss functions. Therefore, the design of the generator and discriminator has a large impact on the final results, which is also the major concern of scientists from many fields.
of adversarial training. This type of loss function is also a unique characteristic of GAN, which means that GAN can be trained without other specifically designed loss functions. Therefore, the design of the generator and discriminator has a large impact on the final results, which is also the major concern of scientists from many fields. Nowadays, there are some pioneering works that introduce GAN to the task of SR, and the variations of the model include SRGAN and ESRGAN. A similar structural feature of these models is to input LR (low-resolution) images and output HR (high-resolution) images that preserve similar global data distribution patterns with the corresponding LR images. This feature is also suitable for DEM data. Then, CEDGAN is specifically designed for spatial interpolation, and experiments in terms of DEM data present that CEDGAN is more effective than traditional interpolation methods (e.g., kriging interpolation) [28]. Therefore, in this paper, we will investigate their applicability to DEM SR in a systematic manner. Details of their structures and advantages are discussed in the following sections.

SRGAN
The rise of convolutional neural networks (CNNs) had a substantial impact on image SR, which greatly enhanced the accuracy of image SR. Before SRGAN, most image SR methods based on neural networks are of supervised learning, the optimization target of which is usually the minimization of the mean squared error (MSE) between recovered images and corresponding real HR images. Although it is convenient for optimizing, the training index of MSE is unable to capture perceptual features of images (e.g., high-frequency texture detail), as it is defined based on pixel-wise image differences. Therefore, Ledig et al. [17] firstly attempted to integrate GAN to improve the photo-realistic perception during image SR, and a method named SRGAN is proposed. In this approach, a perceptual loss function is well designed to replace the original commonly used pixel-based loss function. There are two major parts in the perceptual loss function, i.e., adversarial loss and content loss. Different from the most widely used pixel-based content loss (MSE), a VGG loss, which is based on the ReLU activation layers of the pre-trained 19 layer VGG network presented in [43], is defined to describe the content loss. In other words, highlevel features of real HR images and generated images from the generator are firstly extracted, and then, the Euclidean distance between the two feature representations is Nowadays, there are some pioneering works that introduce GAN to the task of SR, and the variations of the model include SRGAN and ESRGAN. A similar structural feature of these models is to input LR (low-resolution) images and output HR (high-resolution) images that preserve similar global data distribution patterns with the corresponding LR images. This feature is also suitable for DEM data. Then, CEDGAN is specifically designed for spatial interpolation, and experiments in terms of DEM data present that CEDGAN is more effective than traditional interpolation methods (e.g., kriging interpolation) [28]. Therefore, in this paper, we will investigate their applicability to DEM SR in a systematic manner. Details of their structures and advantages are discussed in the following sections.

SRGAN
The rise of convolutional neural networks (CNNs) had a substantial impact on image SR, which greatly enhanced the accuracy of image SR. Before SRGAN, most image SR methods based on neural networks are of supervised learning, the optimization target of which is usually the minimization of the mean squared error (MSE) between recovered images and corresponding real HR images. Although it is convenient for optimizing, the training index of MSE is unable to capture perceptual features of images (e.g., highfrequency texture detail), as it is defined based on pixel-wise image differences. Therefore, Ledig et al. [17] firstly attempted to integrate GAN to improve the photo-realistic perception during image SR, and a method named SRGAN is proposed. In this approach, a perceptual loss function is well designed to replace the original commonly used pixel-based loss function. There are two major parts in the perceptual loss function, i.e., adversarial loss and content loss. Different from the most widely used pixel-based content loss (MSE), a VGG loss, which is based on the ReLU activation layers of the pre-trained 19 layer VGG network presented in [43], is defined to describe the content loss. In other words, high-level features of real HR images and generated images from the generator are firstly extracted, and then, the Euclidean distance between the two feature representations is calculated as the content loss. In this way, more and finer perceptual texture features can be perceived during the training process, which can be ultimately recovered on the generated images. Details of the generator in SRGAN are presented in Figure 3. It can be observed that the residual block with an identical layout is the major core of the generator. The residual block is proposed by [44], which is proposed to solve training problems of deep neural networks. More specifically, deep neural networks are difficult to train, and the performance will degrade when stacking excessive layers. With the residual block, features extracted by shallower layers can be effectively transmitted to corresponding deeper layers. This advantage is significant for image SR, since feature extraction is critical to the final generated results.
Therefore, compared with other SR methods based on neural networks, SRGAN can train the generator to perceive more perceptual features and generate more photo-realistic images. Compared with images, there are some common features in DEM data. For example, characters of terrain textures on plain areas are similar to image textures, as they both present smooth transitions. However, as mentioned above, there are also some unique terrain texture features in DEM data (e.g., ridge and river), which is very different from the natural texture features in images. Moreover, the elevation differences in neighboring cells are significant in many complex terrain areas (e.g., bluff areas), and that is much larger than the largest difference between pixel values (i.e., the range of RGB value of 0-255). Therefore, whether this natural image texture features-based method is suitable for DEM SR task needs to be carefully explored, and more details of experiments are presented in Section 4.

ESRGAN
ESRGAN is proposed based on SRGAN, which is suggested to improve the overall perceptual quality of SR images [18]. In ESRGAN, there are two major modifications in the generator, which are presented in Figure 4. The first modification is to remove all Batch Normalization (BN) layers (Figure 4a), and the second is to replace the original residual block with the proposed Residual-in-Residual Dense Block (RRDB, Figure 4b). Firstly, in SRGAN, features extracted from former layers (e.g., convolution) will be normalized by the BN layer using the mean and variance of each batch during training, and using the estimated mean and variance of the whole training dataset during testing. In this way, Wang et al. [18] suggest that BN layers will bring unpleasant artifacts and limit the generation ability when there are huge differences between the statistics of training and testing datasets. Therefore, BN layers are removed in this method. Then, based on the observation that more layers and connections could always boost performance, the original residual block in SRGAN is replaced by RRDB, which employs a deeper and more complex Therefore, compared with other SR methods based on neural networks, SRGAN can train the generator to perceive more perceptual features and generate more photo-realistic images. Compared with images, there are some common features in DEM data. For example, characters of terrain textures on plain areas are similar to image textures, as they both present smooth transitions. However, as mentioned above, there are also some unique terrain texture features in DEM data (e.g., ridge and river), which is very different from the natural texture features in images. Moreover, the elevation differences in neighboring cells are significant in many complex terrain areas (e.g., bluff areas), and that is much larger than the largest difference between pixel values (i.e., the range of RGB value of 0-255). Therefore, whether this natural image texture features-based method is suitable for DEM SR task needs to be carefully explored, and more details of experiments are presented in Section 4.

ESRGAN
ESRGAN is proposed based on SRGAN, which is suggested to improve the overall perceptual quality of SR images [18]. In ESRGAN, there are two major modifications in the generator, which are presented in Figure 4. The first modification is to remove all Batch Normalization (BN) layers (Figure 4a), and the second is to replace the original residual block with the proposed Residual-in-Residual Dense Block (RRDB, Figure 4b). Firstly, in SRGAN, features extracted from former layers (e.g., convolution) will be normalized by the BN layer using the mean and variance of each batch during training, and using the estimated mean and variance of the whole training dataset during testing. In this way, Wang et al. [18] suggest that BN layers will bring unpleasant artifacts and limit the generation ability when there are huge differences between the statistics of training and testing datasets. Therefore, BN layers are removed in this method. Then, based on the observation that more layers and connections could always boost performance, the original residual block in SRGAN is replaced by RRDB, which employs a deeper and more complex structure ( Figure 4b). With such two modifications, ESRGAN is suggested to be capable of generating more realistic textures during image SR than SRGAN. Therefore, this paper aims to investigate whether ESRGAN has a similar better performance than SRGAN on DEM SR, and more details of experiments are presented in Section 4. structure ( Figure 4b). With such two modifications, ESRGAN is suggested to be capable of generating more realistic textures during image SR than SRGAN. Therefore, this paper aims to investigate whether ESRGAN has a similar better performance than SRGAN on DEM SR, and more details of experiments are presented in Section 4.

CEDGAN
CEDGAN is specially designed for spatial interpolation, and that is an endeavor to investigate the deep spatial knowledge using artificial intelligence [28]. Different from the above two methods, this method introduces the structure of conditional generative adversarial networks (cGAN), which is an extension of GAN [45]. In this method, an encoderdecoder structure is used to construct the generator, which is suggested to be suitable for spatial feature extraction [46], and the structure of the generator is presented in Figure 5. It can be observed that compared with SRGAN or ESRGAN, the generator in CEDGAN is a lightweight network, and fewer parameters make it easier to train. The experimental results in Zhu et al. [28] present that with only several iterations, relatively accurate DEMs can be generated by the generator. Then, with more iterations, this method can generate more accurate DEMs than traditional spatial interpolation methods (e.g., the kriging method). Therefore, this specially designed method is also chosen to compare with image SR methods to show which is better for DEM SR, and more details of experiments are presented in Section 4.

Evaluation Indexes
Considering terrain features and the realistic demand of high-resolution DEM data, several commonly used evaluation indexes related to the accuracy and features of DEM are chosen for comparison, and they can be represented in general root mean square error (RMSE) and mean error (ME) forms presented in Equation (3).

CEDGAN
CEDGAN is specially designed for spatial interpolation, and that is an endeavor to investigate the deep spatial knowledge using artificial intelligence [28]. Different from the above two methods, this method introduces the structure of conditional generative adversarial networks (cGAN), which is an extension of GAN [45]. In this method, an encoder-decoder structure is used to construct the generator, which is suggested to be suitable for spatial feature extraction [46], and the structure of the generator is presented in Figure 5. It can be observed that compared with SRGAN or ESRGAN, the generator in CEDGAN is a lightweight network, and fewer parameters make it easier to train. The experimental results in Zhu et al. [28] present that with only several iterations, relatively accurate DEMs can be generated by the generator. Then, with more iterations, this method can generate more accurate DEMs than traditional spatial interpolation methods (e.g., the kriging method). Therefore, this specially designed method is also chosen to compare with image SR methods to show which is better for DEM SR, and more details of experiments are presented in Section 4. structure (Figure 4b). With such two modifications, ESRGAN is suggested to be capable of generating more realistic textures during image SR than SRGAN. Therefore, this paper aims to investigate whether ESRGAN has a similar better performance than SRGAN on DEM SR, and more details of experiments are presented in Section 4.

CEDGAN
CEDGAN is specially designed for spatial interpolation, and that is an endeavor to investigate the deep spatial knowledge using artificial intelligence [28]. Different from the above two methods, this method introduces the structure of conditional generative adversarial networks (cGAN), which is an extension of GAN [45]. In this method, an encoderdecoder structure is used to construct the generator, which is suggested to be suitable for spatial feature extraction [46], and the structure of the generator is presented in Figure 5. It can be observed that compared with SRGAN or ESRGAN, the generator in CEDGAN is a lightweight network, and fewer parameters make it easier to train. The experimental results in Zhu et al. [28] present that with only several iterations, relatively accurate DEMs can be generated by the generator. Then, with more iterations, this method can generate more accurate DEMs than traditional spatial interpolation methods (e.g., the kriging method). Therefore, this specially designed method is also chosen to compare with image SR methods to show which is better for DEM SR, and more details of experiments are presented in Section 4.

Evaluation Indexes
Considering terrain features and the realistic demand of high-resolution DEM data, several commonly used evaluation indexes related to the accuracy and features of DEM are chosen for comparison, and they can be represented in general root mean square error (RMSE) and mean error (ME) forms presented in Equation (3).

Evaluation Indexes
Considering terrain features and the realistic demand of high-resolution DEM data, several commonly used evaluation indexes related to the accuracy and features of DEM are chosen for comparison, and they can be represented in general root mean square error (RMSE) and mean error (ME) forms presented in Equation (3). where F oi and F gi denote the attribute value of feature F (slope, aspect, and elevation) of the ith cell in the original real HR DEM and generated fake HR DEM, respectively, n is the total cell number of DEM, and ME F denotes the mean error of feature F. In this paper, three types of features including elevation, slope, and aspect are chosen. Among these indexes, the RMSE of elevation can reveal the global accuracy of generated DEMs, and features of slope and aspect can reveal the extent of feature preservation of generated DEMs. In addition to the evaluation of terrain structural features, the evaluation of terrain critical points is also important [47]. Therefore, the displacement distances of peak points and valley points between real HR DEM and generated HR DEM are used to evaluate the ability of the methods to preserve terrain critical points. In our experiments, the top 20 peak points and the lowest 20 valley points are chosen to calculate their displacement distances, and a shorter distance represents a better feature preservation performance.

Data Descriptions and Parameters
To investigate the effectiveness of the selected methods on DEM SR (SRGAN, ESRGAN, and CEDGAN), a dataset of digital elevation models (DEMs) with complex terrain features is used in this paper, which was acquired from the USGS (United States Geological Survey). More specifically, DEM data with the size of 3584 × 3584 from ten different regions are selected, and terrain elevations in the dataset range from 0.5 to 3741 m. The resolution of DEM data is 30 m, and an example region is presented in Figure 6. There can be observed many typical terrain features such as ridges, rivers, and mountains, and there are also some discrete areas with sudden elevation changes, which are much more complex than texture details in natural images. After preprocessing, single-channel DEM tiles (1 × 64 × 64) with no repetition are randomly cropped, and a total of 31,360 high-resolution DEM tiles are obtained. To address the concerns of over-fitting and memorization of training samples, a method of systematic sampling is used to sample 25,088 DEM tiles as training data and the other 6272 DEM tiles as validation data. More specifically, an equal number of DEMs will be sampled from every region. To accommodate the selected methods to DEM data, the original three-channel network structures in SRGAN and ESRGAN are adapted for singlechannel data (CEDGAN is a single-channel network). Moreover, DEM tiles are normalized into float tensor units ([−1.0, 1.0]) based on the respective maximum and minimum cell values of each DEM [28], and this process is presented in Equation (4): where DEM i−n denotes the normalized version of DEM i , and H max and H min denote the maximum and minimum cell values of DEM i , respectively. In this way, the large elevation gaps between DEM tiles from different areas (e.g., depression and plateau) can be avoided, and tiny terrain features can be learned rather than overlooked during the training process. parameters (e.g., optimizer, the slope of the leak for layers with LeakyReLU activation) are kept the same as their original setting in the above methods for performance comparisons. Then, the learning rates of different methods are tuned for the best training results (0.0000001 in SRGAN, 0.0002 in ESRGAN, and 0.00001 in CEDGAN). All experiments are conducted with a downsampling factor of 4, and LR DEMs are obtained by downsampling the HR DEMs using a bicubic kernel with the same downsampling factor. All the models are trained with 100 epochs, and the training images are divided in a random way in each epoch.

Training Procedure
In this section, training details of the selected methods are described. Figures 7-9 respectively present the variation of model accuracy (RMSE-Elevation) and adversarial losses for generator (G) and discriminator (D) in SRGAN, ESRGAN, and CEDGAN during the training procedure. Firstly, as for SRGAN (Figure 7), values of the model error drop dramatically at the first 20,000 batches and become stable after that. We train on 156,800 batches (100 epochs), and it presents that the average error of generated high-resolution DEM gradually stabilized at 1.7 m, which is a surprising result considering complex DEM training data and challenging parameters setting including the relatively larger patch size (1 × 64 × 64) and downsampling factor (4×) used in our training process. Then, the average errors of ESRGAN and CEDGAN are 2.2 meters and 6.4 m, respectively. However, the bicubic interpolation method can obtain an average error of 2.1 m on training data without any pre-training. Therefore, it can be observed that only SRGAN outperforms the bicubic interpolation method during the training process. Then, some parameters also need to be specified before the training process. Most of the parameters (e.g., optimizer, the slope of the leak for layers with LeakyReLU activation) are kept the same as their original setting in the above methods for performance comparisons. Then, the learning rates of different methods are tuned for the best training results (0.0000001 in SRGAN, 0.0002 in ESRGAN, and 0.00001 in CEDGAN). All experiments are conducted with a downsampling factor of 4, and LR DEMs are obtained by downsampling the HR DEMs using a bicubic kernel with the same downsampling factor. All the models are trained with 100 epochs, and the training images are divided in a random way in each epoch.

Training Procedure
In this section, training details of the selected methods are described. Figures 7-9 respectively present the variation of model accuracy (RMSE-Elevation) and adversarial losses for generator (G) and discriminator (D) in SRGAN, ESRGAN, and CEDGAN during the training procedure. Firstly, as for SRGAN (Figure 7), values of the model error drop dramatically at the first 20,000 batches and become stable after that. We train on 156,800 batches (100 epochs), and it presents that the average error of generated highresolution DEM gradually stabilized at 1.7 m, which is a surprising result considering complex DEM training data and challenging parameters setting including the relatively larger patch size (1 × 64 × 64) and downsampling factor (4×) used in our training process. Then, the average errors of ESRGAN and CEDGAN are 2.2 meters and 6.4 m, respectively. However, the bicubic interpolation method can obtain an average error of 2.1 m on training data without any pre-training. Therefore, it can be observed that only SRGAN outperforms the bicubic interpolation method during the training process.
In addition to values of accuracy during the training process, adversarial losses for D and G of these methods are also described at the upper location in Figures 7-9. Firstly, in Figure 7, there can be observed a decreasing tendency of both adversarial losses, which reveals an intrinsic characteristic of GAN (that is, the adversary system results in the development of both D and G). A similar phenomenon appears in the adversarial losses in ESRGAN and CEDGAN (Figures 8 and 9), but their competitions present to be more intense than that in SRGAN. For example, in CEDGAN, the loss of D decreased, and the loss of G increased in the first 20,000 training batches, and that means D suppresses G. Then, the competition state reverses after some training batches (i.e., G is stronger than D). Finally, both D and G gradually become stable. Therefore, it can be concluded that the three methods tend to converge to their game equilibrium during the training.      In addition to values of accuracy during the training process, adversarial losses for D and G of these methods are also described at the upper location in Figures 7-9. Firstly, in Figure 7, there can be observed a decreasing tendency of both adversarial losses, which reveals an intrinsic characteristic of GAN (that is, the adversary system results in the development of both D and G). A similar phenomenon appears in the adversarial losses in ESRGAN and CEDGAN (Figures 8 and 9), but their competitions present to be more intense than that in SRGAN. For example, in CEDGAN, the loss of D decreased, and the loss of G increased in the first 20,000 training batches, and that means D suppresses G. Then, the competition state reverses after some training batches (i.e., G is stronger than D). Finally, both D and G gradually become stable. Therefore, it can be concluded that the three methods tend to converge to their game equilibrium during the training.

Results
Although these methods present different performances on training data, their ability to generate high-resolution DEMs based on the validation set is more focused. Therefore, in this section, we apply the three well-trained models on the validation set to test whether they are over-fitting or memorize training samples. Three types of evaluation are proposed on these methods.

Quantitative Evaluation
In this section, the evaluation indexes proposed in Section 3.3 are utilized to test the effectiveness of the three well-trained models. The validation set consists of 6272 DEM tiles from ten different regions, and the evaluated results of the generated high-resolution DEMs are presented in Table 1. It should be noted that these evaluation index values of all DEM tiles are respectively averaged for a global evaluation. From Table 1, the main observation is that the three neural network-based methods (SRGAN, ESRGAN, and CEDGAN) generate worse results in terms of RMSE-Elevation on the validation set than those on the training data, and that is also a characteristic of learning methods. Nevertheless, the results are still acceptable with small floating precision, and thus, our training is effective. In addition, SRGAN outperforms the other three methods in terms of the indexes of RMSE-Elevation, RMSE-Slope, and RMSE-Aspect. However, considering the preservation of critical points (i.e., peak and valley points), bicubic interpolation (Bicubic) obtains the best results. This is caused by the difference between DEM generation mechanisms of interpolation-based methods and neural network-based methods. More specifically, for interpolation-based methods, some cell values are kept fixed, and only unknown cell values are estimated in terms of existing cell values; as a comparison, all cell values will be regenerated by neural network-based methods. In other words, there is a global restriction framework in interpolation-based methods, which makes constraints for peak and valley points (please see Figure 1). Then, as for neural network-based methods, SRGAN still obtains the best results. Next, more details of the performance of these methods on the validation set are investigated. Firstly, the range of the obtained results of RMSE-Elevation by these methods is divided into five grades (i.e., 0-1, 1-2, 2-3, 3-4, and larger than 4). Then, features of terrain elevations on the corresponding real high-resolution DEM tiles in each grade are analyzed. More specifically, if a method generates a fake high-resolution DEM with the RMSE-Elevation larger than 4, the corresponding real high-resolution DEM will be grouped into ">4" (please see Figure 10). Then, three indexes including the elevation difference between the maximum and minimum, average value of elevations, and variance of elevations of the high-resolution DEM tile are calculated to reveal its terrain complexity; and the values of the three indexes of all DEM tiles in the group ">4" are respectively averaged to reveal global characteristic. The results are presented in Figure 10. It can be first observed a strong correlation between the RMSE-Elevation and terrain complexity. That means DEMs with more complex terrain features (i.e., higher values of the three indexes) are more difficult to recover (larger errors) during the SR process. Then, SRGAN presents to be the most robust method to generate high-resolution DEM, for which SRGAN can recover LR DEM tiles with the same level of terrain complexity to relatively higher precision. Therefore, it can be concluded that SRGAN is better than the other three methods for DEM SR considering both the robustness and preservation of accuracy and features on the generated high-resolution DEMs.
presents to be the most robust method to generate high-resolution DEM, for which SRGAN can recover LR DEM tiles with the same level of terrain complexity to relatively higher precision. Therefore, it can be concluded that SRGAN is better than the other three methods for DEM SR considering both the robustness and preservation of accuracy and features on the generated high-resolution DEMs.

Visual Evaluation
Although SRGAN outperforms the other methods from the perspective of quantitative evaluation, visual evaluation is needed, as there are many cases in image SR in which images with better quantitative indexes present worse visual perception [17,18]. In this section, a visual evaluation of these methods is implemented, and we selected from each grade a representative DEM tile for illustrating the results ( Figure 11). From Figure 11, it can be observed that terrain textures are always over-smoothed in the results of bicubic interpolation. Therefore, although bicubic interpolation outperforms ESRGAN in terms of the index of RMSE-Elevation, the results of ESRGAN present a comparable (or better) visual perception than bicubic interpolation (i.e., preserving more terrain details). However, some local terrain textures seem to be distorted in the recovered DEMs by ESRGAN (red boxes in Figure 11), which might be the factor to reduce its accuracy. As a comparison, this phenomenon rarely appears in the results of SRGAN, and thus, SRGAN outperforms the other methods from this perspective. Finally, considering CEDGAN, there are some noisy signals in its results, and those seem to be much worse than the other methods. Therefore, the visual results suggest that CEDGAN may not be appropriate for the task of DEM SR.

Visual Evaluation
Although SRGAN outperforms the other methods from the perspective of quantitative evaluation, visual evaluation is needed, as there are many cases in image SR in which images with better quantitative indexes present worse visual perception [17,18]. In this section, a visual evaluation of these methods is implemented, and we selected from each grade a representative DEM tile for illustrating the results ( Figure 11). From Figure 11, it can be observed that terrain textures are always over-smoothed in the results of bicubic interpolation. Therefore, although bicubic interpolation outperforms ESRGAN in terms of the index of RMSE-Elevation, the results of ESRGAN present a comparable (or better) visual perception than bicubic interpolation (i.e., preserving more terrain details). However, some local terrain textures seem to be distorted in the recovered DEMs by ESRGAN (red boxes in Figure 11), which might be the factor to reduce its accuracy. As a comparison, this phenomenon rarely appears in the results of SRGAN, and thus, SRGAN outperforms the other methods from this perspective. Finally, considering CEDGAN, there are some noisy signals in its results, and those seem to be much worse than the other methods. Therefore, the visual results suggest that CEDGAN may not be appropriate for the task of DEM SR. Sensors 2022, 22, x FOR PEER REVIEW 14 of 22 Figure 11. The super-resolution results of the selected methods (×4).

Evaluation of Terrain Features Preservation
Finally, we investigate the effectiveness of the methods in recovering terrain features. From the previous evaluations, it can be observed that CEDGAN generates the worst results. Therefore, in this evaluation, only bicubic interpolation, SRGAN, and ESRGAN are focused, and two DEM tiles in two accuracy grades with typical terrain features are chosen for comparison. Figures 12 and 13 present the slope and aspect results of the original DEMs and the high-resolution DEMs generated by these methods, respectively. Considering the evaluation of slope, intuitive perception is that the results of bicubic interpolation are much smoother than those of SRGAN and ESRGAN, and some textures disappear in the results of bicubic interpolation but are retained in SRGAN and ESRGAN. Then, compared with SRGAN, the slope results of ESRGAN present a much more complex feature distribution pattern (such as grid-line distribution). In the computer vision field, the advantage of ESRGAN is that it can preserve more local perceptual features with a more complex network structure when dealing with natural images [18] (see Figure 4). However, it can be observed from the slope results that for DEM SR, without the constraint of global structure (the global precision RMSE-Elevation), local features generated by ESRGAN also present to be out of control. Intuitively, distortions in the results of ESRGAN are obvious

Evaluation of Terrain Features Preservation
Finally, we investigate the effectiveness of the methods in recovering terrain features. From the previous evaluations, it can be observed that CEDGAN generates the worst results. Therefore, in this evaluation, only bicubic interpolation, SRGAN, and ESRGAN are focused, and two DEM tiles in two accuracy grades with typical terrain features are chosen for comparison. Figures 12 and 13 present the slope and aspect results of the original DEMs and the high-resolution DEMs generated by these methods, respectively. Considering the evaluation of slope, intuitive perception is that the results of bicubic interpolation are much smoother than those of SRGAN and ESRGAN, and some textures disappear in the results of bicubic interpolation but are retained in SRGAN and ESRGAN. Then, compared with SRGAN, the slope results of ESRGAN present a much more complex feature distribution pattern (such as grid-line distribution). In the computer vision field, the advantage of ESRGAN is that it can preserve more local perceptual features with a more complex network structure when dealing with natural images [18] (see Figure 4). However, it can be observed from the slope results that for DEM SR, without the constraint of global structure (the global precision RMSE-Elevation), local features generated by ESRGAN also present to be out of control. Intuitively, distortions in the results of ESRGAN are obvious compared with the results of SRGAN and bicubic interpolation. However, the slope results of SRGAN present to keep a better balance between global precision and local features, which may be the reason why SRGAN outperforms the other methods. Then, considering the evaluation of aspect, the three methods generate comparable results, and such a conclusion is consistent with the quantitative evaluation in Table 1. tures, which may be the reason why SRGAN outperforms the other methods. Then, considering the evaluation of aspect, the three methods generate comparable results, and such a conclusion is consistent with the quantitative evaluation in Table 1. Figures 14-16 present the results of structural lines (river and ridge) and critical points on the high-resolution DEMs generated by these methods, respectively. Firstly, it can be observed that SRGAN can obtain the best matching rate of river and ridge features. Then, ESRGAN outperforms bicubic interpolation in preserving river features, but bicubic interpolation performs better in preserving ridge features. As for the preservation of critical points (Figure 16), no obvious rule has been found in the two example results. In this regard, statistical results may be a better solution to evaluate the preservation of critical points, and the main conclusions can be obtained from Table 1 (please see Section 4.3.1). Therefore, to sum up, SRGAN outperforms the other methods in preserving most of the terrain features during DEM SR.   tures, which may be the reason why SRGAN outperforms the other methods. Then, considering the evaluation of aspect, the three methods generate comparable results, and such a conclusion is consistent with the quantitative evaluation in Table 1. Figures 14-16 present the results of structural lines (river and ridge) and critical points on the high-resolution DEMs generated by these methods, respectively. Firstly, it can be observed that SRGAN can obtain the best matching rate of river and ridge features. Then, ESRGAN outperforms bicubic interpolation in preserving river features, but bicubic interpolation performs better in preserving ridge features. As for the preservation of critical points (Figure 16), no obvious rule has been found in the two example results. In this regard, statistical results may be a better solution to evaluate the preservation of critical points, and the main conclusions can be obtained from Table 1 (please see Section 4.3.1). Therefore, to sum up, SRGAN outperforms the other methods in preserving most of the terrain features during DEM SR.    Figures 14-16 present the results of structural lines (river and ridge) and critical points on the high-resolution DEMs generated by these methods, respectively. Firstly, it can be observed that SRGAN can obtain the best matching rate of river and ridge features. Then, ESRGAN outperforms bicubic interpolation in preserving river features, but bicubic interpolation performs better in preserving ridge features. As for the preservation of critical points (Figure 16), no obvious rule has been found in the two example results. In this regard, statistical results may be a better solution to evaluate the preservation of critical points, and the main conclusions can be obtained from

Quantitative Analysis of the Computational Performance
During the training process, all the experiments were implemented on the device with a single Nvidia RTX 2080Ti GPU and Intell Core i9-10900X CPU @ 3.70 GHz, and all the experiments are implemented using Python. The time and space costs required for training are listed in Table 2. Generally, recovering DEM from lower-resolution will bring greater challenges to the task of image SR or DEM SR, which means more details are required to be recovered. Most traditional image SR methods can recover natural images from 3× downsampling [19]. With the development of SR techniques, SRGAN firstly used a framework to recover photo-realistic images from 4× downsampling [17], and subsequent methods are proposed with the benchmark of 4× downsampling to demonstrate their robustness and effectiveness. Then, in this section, the impact of the resolution on different SR methods is investigated, and the downsampling factor is set as 2× for comparison. The training details and the obtained evaluation results based on testing data are presented in Figure 17 and Table 3, respectively. Firstly, considering the training phase (Figure 17), the average error of the generated high-resolution DEM obtained by SRGAN, ESRGAN, and CEDGAN gradually stabilized at 0.65 m, 0.73 m, and 5.51 m, respectively. From the results, the training error obtained in terms of 2× downsampling is lower than that of 4× downsampling (the corresponding errors in terms of 4× downsampling obtained by SRGAN, ESRGAN, and CEDGAN are 1.7 m, 2.2 m, and 6.4 m, respectively). Then, the evaluation results at the testing phase present a similar tendency (see Section 4.3.1). Therefore, it can be  Table 2. Generally, recovering DEM from lower-resolution will bring greater challenges to the task of image SR or DEM SR, which means more details are required to be recovered. Most traditional image SR methods can recover natural images from 3× downsampling [19]. With the development of SR techniques, SRGAN firstly used a framework to recover photorealistic images from 4× downsampling [17], and subsequent methods are proposed with the benchmark of 4× downsampling to demonstrate their robustness and effectiveness. Then, in this section, the impact of the resolution on different SR methods is investigated, and the downsampling factor is set as 2× for comparison. The training details and the obtained evaluation results based on testing data are presented in Figure 17 and Table 3, respectively. Firstly, considering the training phase (Figure 17), the average error of the generated high-resolution DEM obtained by SRGAN, ESRGAN, and CEDGAN gradually stabilized at 0.65 m, 0.73 m, and 5.51 m, respectively. From the results, the training error obtained in terms of 2× downsampling is lower than that of 4× downsampling (the corresponding errors in terms of 4× downsampling obtained by SRGAN, ESRGAN, and CEDGAN are 1.7 m, 2.2 m, and 6.4 m, respectively). Then, the evaluation results at the testing phase present a similar tendency (see Section 4.3.1). Therefore, it can be concluded that DEM SR is more difficult with the larger downsampling factors. Moreover, SRGAN still outperforms the other methods in terms of the RMSE-Elevation.   In the above experiments, we select DEMs from ten different regions including various terrains (e.g., areas of plateau mountain and basin) to test the robustness of different models. Usually, it is difficult for deep learning-based methods to deal with datasets they are not familiar with, which means the deep spatial features in a new area are difficult to be captured by a model trained with different datasets. Therefore, in this section, we want to explore how these models perform when faced with datasets in new domains. Therefore, the DEM dataset from a new area is used to test the sensitivity of the models. From Figure 18, the DEM contains complex terrain features with the size of 448 × 448, and the resolution is 20 m, which is also different from DEMs of 30 m resolution used in the above experiments. All the results are obtained by using the well-trained models in Section 4.2, and the results  In the above experiments, we select DEMs from ten different regions including various terrains (e.g., areas of plateau mountain and basin) to test the robustness of different models. Usually, it is difficult for deep learning-based methods to deal with datasets they are not familiar with, which means the deep spatial features in a new area are difficult to be captured by a model trained with different datasets. Therefore, in this section, we want to explore how these models perform when faced with datasets in new domains. Therefore, the DEM dataset from a new area is used to test the sensitivity of the models. From Figure 18, the DEM contains complex terrain features with the size of 448 × 448, and the resolution is 20 m, which is also different from DEMs of 30 m resolution used in the above experiments. All the results are obtained by using the well-trained models in Section 4.2, and the results are listed in Table 4. The results present that bicubic outperforms other methods when dealing with such a new area. This suggests that although these deep learning models are trained with datasets from ten different regions, they are still weak to deal with new datasets. Then, compared with other deep learning methods, SRGAN still obtains the best results (with only one evaluation index exception), which also shows its robustness. are listed in Table 4. The results present that bicubic outperforms other methods when dealing with such a new area. This suggests that although these deep learning models are trained with datasets from ten different regions, they are still weak to deal with new datasets. Then, compared with other deep learning methods, SRGAN still obtains the best results (with only one evaluation index exception), which also shows its robustness.

Conclusions and Future Work
SR is a classic topic in the field of computer vision, which can be widely applied in relevant tasks. This technique has attracted the attention of researchers from different fields for its easy understanding but important target, i.e., recovering high-quality texture details from lower-grade images. Nowadays, with the rapid development of machine learning (especially deep learning), image SR has made great progress. Although the task of DEM SR presents to be similar to image SR, there are few works that introduce image SR methods to DEM SR. The reason may be that DEM data have much more complex terrain features and larger elevation differences than natural images do. Moreover, the lack of DEM data for training may also be a limiting factor for previous research. Therefore, this paper attempts to investigate the performance of image SR methods when applied to DEM SR. More specifically, traditional bicubic interpolation method and three state-of-the-art SR methods (SRGAN, ESRGAN, and CEDGAN) based on neural networks

Conclusions and Future Work
SR is a classic topic in the field of computer vision, which can be widely applied in relevant tasks. This technique has attracted the attention of researchers from different fields for its easy understanding but important target, i.e., recovering high-quality texture details from lower-grade images. Nowadays, with the rapid development of machine learning (especially deep learning), image SR has made great progress. Although the task of DEM SR presents to be similar to image SR, there are few works that introduce image SR methods to DEM SR. The reason may be that DEM data have much more complex terrain features and larger elevation differences than natural images do. Moreover, the lack of DEM data for training may also be a limiting factor for previous research. Therefore, this paper attempts to investigate the performance of image SR methods when applied to DEM SR. More specifically, traditional bicubic interpolation method and three state-of-the-art SR methods (SRGAN, ESRGAN, and CEDGAN) based on neural networks are selected, and terrain-related evaluation indexes including slope, aspect, river, ridge, and critical points are used for comparison.
The results present that SRGAN outperforms the other three methods in terms of most evaluation indexes, and CEDGAN presents to be the worst one. Considering these two methods, there are two major differences. The first is that SRGAN designs a much deeper generator network than CEDGAN, and the second is that SRGAN uses specific loss functions and CEDGAN uses only adversarial loss. Network structure and loss function are also the most concerning points of the methods based on neural networks. Therefore, specifically designing the network structure and appropriate loss function for spatial data is an operative road to integrate GIS tasks and artificial intelligence (AI). Then, ESRGAN is proposed based on the adaptation of SRGAN, and ESRGAN is suggested to be able to generate natural images with better visual perception performance. Although it can be observed that many terrain features are learned by ESRGAN, these terrain features are always distorted. Therefore, ESRGAN is appropriate for natural image SR, but this learning process may not be suitable for DEM data. Finally, compared with methods based on neural networks, bicubic interpolation can output relatively balanced results considering different evaluation indexes. Moreover, this method can be applied to different terrain data without a training process, which is more efficient than learning-based methods. However, the main problem is that most of the terrain features are over-smoothed after the interpolation process. Therefore, for the DEM SR task, SRGAN is recommended when adequate training data and time are available, and bicubic interpolation can be used when efficiency rather than terrain features are concerned.
In terms of the results, we plan to extend our work from the following perspectives: (1) ESRGAN outperforms SRGAN in generating high-resolution natural images with realistic visual perception. Although the original design of ESRGAN is inappropriate for DEM SR, its idea of adapting SRGAN can be introduced to DEM SR, and that means specifically designed structures of neural networks and loss functions for terrain data have great potential to improve the effectiveness of terrain features preservation during DEM SR.
(2) The idea of SR can also be introduced to other GIS fields. For example, trajectory points are usually collected between a fixed time interval (e.g., 1 min), which limits its application when dense trajectory points are needed [48]. Therefore, designing methods based on neural networks to increase the density of trajectory points is also a valuable exploratory direction.