RSPCN: Super-Resolution of Digital Elevation Model Based on Recursive Sub-Pixel Convolutional Neural Networks

: The digital elevation model (DEM) is known as one kind of the most signiﬁcant fundamental geographical data models. The theory, method and application of DEM are hot research issues in geography, especially in geomorphology, hydrology, soil and other related ﬁelds. In this paper, we improve the efﬁcient sub-pixel convolutional neural networks (ESPCN) and propose recursive sub-pixel convolutional neural networks (RSPCN) to generate higher-resolution DEMs (HRDEMs) from low-resolution DEMs (LRDEMs). Firstly, the structure of RSPCN is described in detail based on recursion theory. This paper explores the effects of different training datasets, with the self-adaptive learning rate Adam algorithm optimizing the model. Furthermore, the adding-“zero” boundary method is introduced into the RSPCN algorithm as a data preprocessing method, which improves the RSPCN method’s accuracy and convergence. Extensive experiments are conducted to train the method till optimality. Finally, comparisons are made with other traditional interpolation methods, such as bicubic, nearest-neighbor and bilinear methods. The results show that our method has obvious improvements in both accuracy and robustness and further illustrate the feasibility of deep learning methods in the DEM data processing area.


Introduction
The digital elevation model (DEM [1]) is a digital simulation of a terrain surface using limited terrain elevation data, which contains rich topographic information for the application and analysis of geosciences [2]. DEM has the characteristics of simple geographical data organization, intuitive terrain information representation and efficient terrain factor interpretation. It is widely used in geomorphology, hydrology, soil and other related fields and plays an important role in the practice of surveying and mapping, soil and water conservation, hydrogeological disaster monitoring and controlling and land use managing and planning [3]. Resolution in our paper denotes "pixel size", which is an important index for DEM to describe the size of the smallest unit. It is important to explore terrain modeling solutions for generating higher-resolution DEMs (HRDEMs) from captured low-resolution DEMs (LRDEMs) [4][5][6], referred to as super resolution (SR) solutions. During the process of SR, a clearer image is reconstructed by using the information of at least one low-resolution image [7][8][9][10]. SR technology has high application need and application prospect, especially for applications that demand higher-resolution and higher-fidelity terrain information. There are some examples: (1) real-time terrain simulations serving for vehicle route planning and fast adaptation; (2) realistic terrain rendering technology used for ranging simulations, entertainment and 3D high realistic gaming graphic environments [5]; (3) the SR mode on satellites (such as SPOT5 satellite) to achieve HRDEMs from low-resolution cameras [11,12]. In our study, we focus on the terrain modeling process of LRDEMs with an aim to obtain HRDEMs terrain models with high fidelity to the real terrain structures.

Study Datasets and Data Preprocess
To show and compare different terrain modeling methods in a straightforward manner, we chose a publicly available collection [64] of single-scale elevation models of representative landforms as our experimental datasets. These HRDEMs of archetypal landform types are derived from the United States Geological Survey (USGS) 3D Elevation Program (3DEP) LiDAR sources with resolution ranging from 2 to 10 m. The six archetypal landform datasets include an eroded plateau (Bryce Canyon, UT, USA, Figure 1, Full), a volcanic caldera (Crater Lake, OR, USA, Figure 2, Full), active sand dunes (Great Sand Dunes, CO, USA, Figure 3, Full), a braided riverbed (Jackson Hole, WY, USA, Figure 4, Full), a shield caldera (Kīlauea, HI, USA, Figure 5, Full) and stabilized sand dunes (Sandhills, NE, USA, Figure 6, Full). The datasets are intercepted and made into the same size DEM of 1000 × 1000 by downsampling DEMs with medium interval from above DEMs. Then, these six classes of datasets are normalized, as shown in Formula (1), shown as preprocessed DEMs in Figures 1-6. Finally, we obtain 500×500 LRDEMs by downsampling DEMs with medium interval from 1000×1000 DEMs. These 500×500 preprocessed datasets are obtained as experimental DEMs, the detailed information about which is shown in Table 1. (1) ISPRS Int. J. Geo-Inf. 2021, 10, x FOR PEER REVIEW 3 of 18 USA, Figure 6, Full). The datasets are intercepted and made into the same size DEM of 1000 × 1000 by downsampling DEMs with medium interval from above DEMs. Then, these six classes of datasets are normalized, as shown in Formula (1), shown as preprocessed DEMs in Figures 1-6. Finally, we obtain 500 × 500 LRDEMs by downsampling DEMs with medium interval from 1000 × 1000 DEMs. These 500 × 500 preprocessed datasets are obtained as experimental DEMs, the detailed information about which is shown in Table 1.    USA, Figure 6, Full). The datasets are intercepted and made into the same size DEM of 1000 × 1000 by downsampling DEMs with medium interval from above DEMs. Then, these six classes of datasets are normalized, as shown in Formula (1), shown as preprocessed DEMs in Figures 1-6. Finally, we obtain 500 × 500 LRDEMs by downsampling DEMs with medium interval from 1000 × 1000 DEMs. These 500 × 500 preprocessed datasets are obtained as experimental DEMs, the detailed information about which is shown in Table 1.    USA, Figure 6, Full). The datasets are intercepted and made into the same size DEM of 1000 × 1000 by downsampling DEMs with medium interval from above DEMs. Then, these six classes of datasets are normalized, as shown in Formula (1), shown as preprocessed DEMs in Figures 1-6. Finally, we obtain 500 × 500 LRDEMs by downsampling DEMs with medium interval from 1000 × 1000 DEMs. These 500 × 500 preprocessed datasets are obtained as experimental DEMs, the detailed information about which is shown in Table 1.

Efficient Sub-Pixel Convolutional Neural Networks (ESPCN)
The specific process of the algorithm is as follows: LRDEM data are used as the input data into the network, which enter into the three sub-pixel convolutional layers using rectified linear unit (ReLU) as activation function. After entering into the first sub-pixel convolutional layer with kernel 4 × 4, the datasets will turn into 64-channel datasets. Then, second sub-pixel convolutional layer with kernel 3 × 3 finally outputs 4-channel datasets, which means that the length and width of the DEM grid data increases by one time, respectively. The self-adaptive learning rate Adam [65] algorithm is used to optimize the model. The specific structure of our method is shown in Figure 7. The process of the data formation is shown in Figure 8.

Efficient Sub-Pixel Convolutional Neural Networks (ESPCN)
The specific process of the algorithm is as follows: LRDEM data are used as the input data into the network, which enter into the three sub-pixel convolutional layers using rectified linear unit (ReLU) as activation function. After entering into the first sub-pixel convolutional layer with kernel 4 × 4, the datasets will turn into 64-channel datasets. Then, second sub-pixel convolutional layer with kernel 3 × 3 finally outputs 4-channel datasets, which means that the length and width of the DEM grid data increases by one time, respectively. The self-adaptive learning rate Adam [65] algorithm is used to optimize the model. The specific structure of our method is shown in Figure 7. The process of the data formation is shown in Figure 8.  In our method, we use common loss function in the network and mean-square error (MSE) between the real DEM data , and the generated super-resolution DEM data , , as shown in Formula (2), where is the square matrix's number of rows/columns.

Recursive Sub-Pixel Convolutional Neural Networks (RSPCN)
Based on ESPCN, the proposed RSPCN model is presented by introducing the idea of recursion. The network is outlined in Figure 9, containing three sub-networks: embedding, recursion and reconstruction networks. The embedding network is used to generate feature maps from LRDEMs. Next, the recursion part finishes the SR part. After this, final feature maps in the recursion part are fed into the reconstruction network to generate the output HRDEMs.

Efficient Sub-Pixel Convolutional Neural Networks (ESPCN)
The specific process of the algorithm is as follows: LRDEM data are used as the input data into the network, which enter into the three sub-pixel convolutional layers using rectified linear unit (ReLU) as activation function. After entering into the first sub-pixel convolutional layer with kernel 4 × 4, the datasets will turn into 64-channel datasets. Then, second sub-pixel convolutional layer with kernel 3 × 3 finally outputs 4-channel datasets, which means that the length and width of the DEM grid data increases by one time, respectively. The self-adaptive learning rate Adam [65] algorithm is used to optimize the model. The specific structure of our method is shown in Figure 7. The process of the data formation is shown in Figure 8.  In our method, we use common loss function in the network and mean-square error (MSE) between the real DEM data , and the generated super-resolution DEM data , , as shown in Formula (2), where is the square matrix's number of rows/columns.

Recursive Sub-Pixel Convolutional Neural Networks (RSPCN)
Based on ESPCN, the proposed RSPCN model is presented by introducing the idea of recursion. The network is outlined in Figure 9, containing three sub-networks: embedding, recursion and reconstruction networks. The embedding network is used to generate feature maps from LRDEMs. Next, the recursion part finishes the SR part. After this, final feature maps in the recursion part are fed into the reconstruction network to generate the output HRDEMs.  In our method, we use common loss function in the network and mean-square error (MSE) between the real DEM data x i,j and the generated super-resolution DEM data y i,j , as shown in Formula (2), where n is the square matrix's number of rows/columns.

Recursive Sub-Pixel Convolutional Neural Networks (RSPCN)
Based on ESPCN, the proposed RSPCN model is presented by introducing the idea of recursion. The network is outlined in Figure 9, containing three sub-networks: embedding, recursion and reconstruction networks. The embedding network is used to generate feature maps from LRDEMs. Next, the recursion part finishes the SR part. After this, final feature maps in the recursion part are fed into the reconstruction network to generate the output HRDEMs. ISPRS Int. J. Geo-Inf. 2021, 10, x FOR PEER REVIEW 6 of 18 Figure 9. Architecture of RSPCN. It consists of three nets: embedding net, recursion net and reconstruction net. The recursion net is unfolded in Figure 7, as mentioned above.
The embedding network takes the input LRDEMs and extracts their features. The input parameter denotes magnification factor. Firstly, LRDEMs are normalized and turned into binary codes. Secondly, six different types of terrain are labeled as tensors (from 0 to 5). Finally, these tensors will be passed into the recursion layer.
The recursion network is the main network, containing recursive three-layer CNN to solve the task of SR. Every recursive step completes one ESPCN process with the parameter counting, until = . Meanwhile, every recursion makes the cell size of DEM smaller and outputs the tensors with labels.
The reconstruction network transforms these tensors (2 channel) back into the LRDEM space (3 channel), generating HRDEMs of different terrain.

Adding-"Zero" Data Preprocessing Method
During the convolutional layers, the boundary pixels of the DEMs cannot be convoluted. The reason is that the boundary pixels cannot completely overlap the convolution kernel. For example, the edge of one pixel will not be processed with a kernel 3 × 3. Therefore, we add "zero" (representing value = 255 in the gray picture) to the boundary of the DEMs. In the results, we will delete the "zero" boundary of the generated DEMs and finally obtain the SRDEMs to do the comparison and analysis. The specific data processing steps are as follows.
Firstly, we add "zero" to the boundary of the 500 × 500 LRDEMs to build 600 × 600 LRDEMs as the training datasets. Then, after the training process, we obtain 1200 × 1200 generated DEMs. Finally, we obtain the final 1000 × 1000 SRDEMs by deleting the boundary. The whole process are shown in Figure 10. By comparing with the preprocessed 1000 × 1000 DEMs, which are shown in Figures 1-5, the accuracy of the final 1000 × 1000 SRDEMs is analyzed. The experimental environment is a modest laptop computer (a 64bit AMD Ryzen 54,600U with Radeon Graphics @ 2.10 GHz, RAM 16 GB). Visual Studio 2015 and Opencv 3.1.0 are used to transform the normalized data into binary data that is trained in the Tensorflow platform. The embedding network takes the input LRDEMs and extracts their features. The input parameter N denotes magnification factor. Firstly, LRDEMs are normalized and turned into binary codes. Secondly, six different types of terrain are labeled as tensors (from 0 to 5). Finally, these tensors will be passed into the recursion layer.
The recursion network is the main network, containing recursive three-layer CNN to solve the task of SR. Every recursive step completes one ESPCN process with the parameter n counting, until n = N. Meanwhile, every recursion makes the cell size of DEM smaller and outputs the tensors with labels.
The reconstruction network transforms these tensors (2 channel) back into the LRDEM space (3 channel), generating HRDEMs of different terrain.

Adding-"Zero" Data Preprocessing Method
During the convolutional layers, the boundary pixels of the DEMs cannot be convoluted. The reason is that the boundary pixels cannot completely overlap the convolution kernel. For example, the edge of one pixel will not be processed with a kernel 3 × 3. Therefore, we add "zero" (representing value = 255 in the gray picture) to the boundary of the DEMs. In the results, we will delete the "zero" boundary of the generated DEMs and finally obtain the SRDEMs to do the comparison and analysis. The specific data processing steps are as follows.
Firstly, we add "zero" to the boundary of the 500 × 500 LRDEMs to build 600 × 600 LRDEMs as the training datasets. Then, after the training process, we obtain 1200 × 1200 generated DEMs. Finally, we obtain the final 1000 × 1000 SRDEMs by deleting the boundary. The whole process are shown in Figure 10. By comparing with the preprocessed 1000 × 1000 DEMs, which are shown in Figures 1-5, the accuracy of the final 1000 × 1000 SRDEMs is analyzed. The experimental environment is a modest laptop computer (a 64-bit AMD Ryzen 54,600U with Radeon Graphics @ 2.10 GHz, RAM 16 GB). Visual Studio 2015 and Opencv 3.1.0 are used to transform the normalized data into binary data that is trained in the Tensorflow platform. . The process of "adding-zero": adding "zero" to the boundary of the 500 × 500 LRDEMs to build 600 × 600 input datasets before the RSPCN network; deleting the "zero" boundary of the 1200 × 1200 generated DEMs to obtain the final 1000 × 1000 SRDEMs. For visualization, the black outlines show the picture boundary.

Data Augmentation (DA)
During the experimental process, the input datasets we use are different types of the LRDEMs, divided into six types (0 to 5), and the training datasets are HRDEMs derived from 3DEP. The training datasets have only one DEM per type. Thus, in order to alleviate the effect of over-fitting problems and expand the training datasets, we use the data augmentation (DA) method [66] to expand them. The detailed step is adding 100 times of Gaussian noise ~ 0, ∆ into each type of the DEM, where ∆ means the resolution of the LRDEMs. Thus, we obtain 100 DEMs per type and 600 DEMs in total; test samples contain merely six DEMs in total, with 1 DEM per type.

Selection of Training DataSets
Through the above steps, we have two ways to generate training datasets. Training datasets No. 1: six types with 100 DEMs of each type (by DA). Training datasets No. 2: six types with 1 DEM of each type. The results are compared and analyzed by two different types of training datasets. As for the evaluation indexes of the experiments, root mean square error (RMSE) [67,68], peak signal to noise ratio (PSNR) and structural similarity (SSIM) [69] are used. The smaller the RMSE is, the higher the accuracy of the DEM super-resolution is, as shown in Formula (3). The bigger the PSNR is, the more similar the super-resolution DEM is to the real DEM data, as is shown in Formulas (4) and (5). SSIM is another index to measure the similarity of two images, which is modeled as a combination of brightness, contrast and structure. The value of SSIM is in the range of −1 to 1. The closer the SSIM is to 1, the more similar the super-resolution DEM is to the real DEM data, as shown in Formulas (6)-(8) [69].

Data Augmentation (DA)
During the experimental process, the input datasets we use are different types of the LRDEMs, divided into six types (0 to 5), and the training datasets are HRDEMs derived from 3DEP. The training datasets have only one DEM per type. Thus, in order to alleviate the effect of over-fitting problems and expand the training datasets, we use the data augmentation (DA) method [66] to expand them. The detailed step is adding 100 times of Gaussian noise ε ∼ N(0, ∆d) into each type of the DEM, where ∆d means the resolution of the LRDEMs. Thus, we obtain 100 DEMs per type and 600 DEMs in total; test samples contain merely six DEMs in total, with 1 DEM per type.

Selection of Training DataSets
Through the above steps, we have two ways to generate training datasets. Training datasets No. 1: six types with 100 DEMs of each type (by DA). Training datasets No. 2: six types with 1 DEM of each type. The results are compared and analyzed by two different types of training datasets. As for the evaluation indexes of the experiments, root mean square error (RMSE) [67,68], peak signal to noise ratio (PSNR) and structural similarity (SSIM) [69] are used. The smaller the RMSE is, the higher the accuracy of the DEM super-resolution is, as shown in Formula (3). The bigger the PSNR is, the more similar the super-resolution DEM is to the real DEM data, as is shown in Formulas (4) and (5). SSIM is another index to measure the similarity of two images, which is modeled as a combination of brightness, contrast and structure. The value of SSIM is in the range of −1 to 1. The closer the SSIM is to 1, the more similar the super-resolution DEM is to the real DEM data, as shown in Formulas (6)-(8) [69]. In the above formulas, x is normalized 1000×1000 DEM data (shown in Figures 1-5), and y is normalized final SRDEM data. In Formula (4), p is the number of bits for each sample value, and MSE is the mean square error, as shown in Formula (5). In Formula (6)- (8), µ x , µ y are the mean of x and y, respectively; σ x , σ y are the standard deviation of x and y, respectively; σ xy is the covariance of x and y; and c 1 , c 2 are fixed parameters used to maintain the stability of Formula (6), where L is the maximum value of the pixel (in this paper, L ≈ 1, due to normalization).
We compare the different selections of Adam's initial learning rate value (δ = 0.01, 0.001, 0.0001) using two types of training datasets (No.1 and No.2), respectively, as shown in Tables 2 and 3, where m denotes the number of training steps. The results show that the parameter δ is chosen as 0.0001 for both training datasets, making the algorithm converge faster. Then, extensive experiments are conducted at the same training steps (500) for both training datasets. The results are shown in Table 4.  Figure 11. When the Adam's initial learning rate is not that good (for example, δ = 0.01 in Figure 11), the No.2 training datasets make the algorithm unstable. When the training steps are close to 100 steps, the cost values become oscillating from 2.0166×10 −4 (the 91st step) back to 0.02023 (the 99th step).  Figure 11. When the Adam's initial learning rate is not that good (for example, = 0.01 in Figure 11), the No.2 training datasets make the algorithm unstable. When the training steps are close to 100 steps, the cost values become oscillating from 2.0166 × 10 (the 91st step) back to 0.02023 (the 99th step). When the chosen Adam's initial learning rate is unsuitable, we use No. 1 training datasets with the DA process to increase the stability and accuracy of the algorithm. When the selected parameter makes the algorithm converge in an efficient way (such as = 0.0001 in Table 3

Results Based on ESPCN
In this section, we use No. 2 datasets with = 0.0001 to generate HRDEMs. After 10,000 training steps, compared with traditional interpolation methods (bicubic, nearestneighbor, bilinear), the accuracy of different methods is shown in Table 5. When the chosen Adam's initial learning rate δ is unsuitable, we use No. 1 training datasets with the DA process to increase the stability and accuracy of the algorithm. When the selected parameter δ makes the algorithm converge in an efficient way (such as δ = 0.0001 in Table 3 It can be seen from Table 4 that different training datasets have different effects on different types of landforms. Compared with the No. 1 training datasets, the training No. 2 datasets perform better, with the highest PSNR and the smallest RMSE. However, it has the disadvantage of instability, as shown in Figure 11. When the Adam's initial learning rate is not that good (for example, = 0.01 in Figure 11), the No.2 training datasets make the algorithm unstable. When the training steps are close to 100 steps, the cost values become oscillating from 2.0166 × 10 (the 91st step) back to 0.02023 (the 99th step). When the chosen Adam's initial learning rate is unsuitable, we use No. 1 training datasets with the DA process to increase the stability and accuracy of the algorithm. When the selected parameter makes the algorithm converge in an efficient way (such as = 0.0001 in Table 3

Results Based on ESPCN
In this section, we use No. 2 datasets with = 0.0001 to generate HRDEMs. After 10,000 training steps, compared with traditional interpolation methods (bicubic, nearestneighbor, bilinear), the accuracy of different methods is shown in Table 5.

Results Based on ESPCN
In this section, we use No. 2 datasets with δ = 0.0001 to generate HRDEMs. After 10,000 training steps, compared with traditional interpolation methods (bicubic, nearestneighbor, bilinear), the accuracy of different methods is shown in Table 5. According to Table 5, it can be seen that the RSPCN method performs better when facing different terrain types than the other traditional interpolation algorithms, with more PSNR values higher and more RMSE values smaller. As for Kilauea (shield caldera), although the PSNR and RMSE values are not the best, those of our method are very close to the best ones. These four methods all perform well regarding the SSIM index (close to 1). As for the bicubic interpolation method, the PSNR and RMSE values of the Jackson Hole DEM data perform best among the three traditional methods. As for the nearest-neighbor interpolation method, the PSNR and RMSE values perform better except for Jackson Hole than those of the other two traditional methods. As for the bilinear interpolation method, its results are similar to those of the bicubic interpolation method, but it does not perform well for Jackson Hole DEM data and Sand Hills DEM data. However, the evaluation index values of the three traditional methods are very close to each other and only have slight differences. The accuracy of the RSPCN model becomes higher every step during the training process, while other traditional interpolation methods could not improve the precision by learning. Therefore, the RSPCN algorithm has some advantages in improving the accuracy of the DEM interpolation. The error maps of the generated SRDEMs of different terrain types are shown in Figure 13. For each test site, the color bar scaling of the errors is the same among different methods. From Figure 13, compared with other methods, it can be seen that our method has smaller maximum error values except for the Kīlauea terrain, which could effectively reconstruct the SRDEMs.

Results Based on RSPCN
Based on the proposed RSPCN method in Section 3.2, we use 62 × 62 LRDEMs to build 992 × 992 HRDEMs by setting the magnification factor = 4 (2 = ). We choose the dataset eroded plateau (Bryce Canyon, UT, USA), as shown in Figure 1. Firstly, we add "zero" to the boundary of the 62 × 62 LRDEMs and generate 75 × 75 experimental DEMs to generate 1200 × 1200 HRDEMs. After the training process, we delete the boundary and finally obtain the 992 × 992 SRDEMs. The accuracy of RSCPN is shown in Table  6. After 10,000 training steps, the process of the RSPCN and the loss function curves are shown in Figures 14 and 15.

Results Based on RSPCN
Based on the proposed RSPCN method in Section 3.2, we use 62× 62 LRDEMs to build 992×992 HRDEMs by setting the magnification factor N = 4 (2 N = 992 62 ). We choose the dataset eroded plateau (Bryce Canyon, UT, USA), as shown in Figure 1. Firstly, we add "zero" to the boundary of the 62×62 LRDEMs and generate 75×75 experimental DEMs to generate 1200×1200 HRDEMs. After the training process, we delete the boundary and finally obtain the 992×992 SRDEMs. The accuracy of RSCPN is shown in Table 6. After 10,000 training steps, the process of the RSPCN and the loss function curves are shown in Figures 14 and 15.  Figure 13. The error maps of the generated SRDEMs with different methods.

Results Based on RSPCN
Based on the proposed RSPCN method in Section 3.2, we use 62 × 62 LRDEMs to build 992 × 992 HRDEMs by setting the magnification factor = 4 (2 = ). We choose the dataset eroded plateau (Bryce Canyon, UT, USA), as shown in Figure 1. Firstly, we add "zero" to the boundary of the 62 × 62 LRDEMs and generate 75 × 75 experimental DEMs to generate 1200 × 1200 HRDEMs. After the training process, we delete the boundary and finally obtain the 992 × 992 SRDEMs. The accuracy of RSCPN is shown in Table  6. After 10,000 training steps, the process of the RSPCN and the loss function curves are shown in Figures 14 and 15.

Discussion
As an important expression model of topography, DEM has been applied to the dynamic simulation of ground surface, multi-scale modeling in geoscience, and other related fields. The DEM interpolation method makes a prediction of the values of a primary variable at scattered data within the same sampled area [70,71], which is an important way to build a spatially continuous surface by DEM data.
At present, most common solutions for spatial interpolation focus on geostatistical methods. In 1951, D.G. Krige [72] presented a linear least squares regression interpolation method, the Kriging interpolation method. In the 1960s, Birkhoff and Garabdian [73] proposed the two-dimensional spline function, and De Boor [74] introduced bicubic spline interpolation. Spline interpolation could better fit the overall trend of the surface but ignored the detailed information of the terrain. In 1964, Bengtsson and Nordbeck [75] proposed a vector-based method, triangulated irregular network (TIN). TIN can reduce the redundancy of flat areas and can represent topographic linear features well. The research on TIN has not stopped [76][77][78][79]. However, TINs ignore non-linear information and are unable to represent spatial terrain types, such as cliffs, caves or holes. In 1968, Shepard [80] proposed the inverse distance weighting (IDW) method. IDW is simple and fast to calculate, but it fails to incorporate the spatial structure and ignores information beyond the neighborhood.
Among these common interpolation methods, the Kriging suite has been proven to perform better in most of the DEM modeling with high geometric accuracy and has become the most common method used in spatial interpolation. However, the goals of Kriging are practically unattainable since the mean error and the variance of the errors are always unknown [71]. Meanwhile, we use the Kriging algorithm programmed in MATLAB software to deal with the 1000 × 1000 DEM array on the same computational environment described in Section 4. The program exceeds the maximum array size preference in the current computational environment. This shows that compared with our method, the Kriging method is more computationally intensive.
Entering the 21st century, researchers also explored other theories to interpolate the DEM. Based on the fractal geometry theory, Paramanathan P. and Uthayakumar R. [81], Jiang P., et al. [82] and Liu L. and Wang X., et al. [83] have developed the fractal interpolation method, which eliminates smoothing problems. However, it could be only used in regular grid data. Based on the surface theory and differential geometry theory, Yue [71] presents the high accuracy surface modeling (HASM) method to simulate the Earth's surface system, which has been proven to perform well in eco-environmental surface modeling. Based on the deep learning theory, the "super-resolution" theory is commonly used in the image processing field. When it comes to DEM construction, our RSPCN method presents a deep-learning-based solution for spatial interpolation with higher accuracy than that of the traditional linear interpolation methods. In order to express the comparison between our method and the traditional method more intuitively, we have carried out a pixel-level comparison, the result of which is attached as Supplementary Material. Moreover, the paper provides a whole set of data processing steps from the preprocessing of

Discussion
As an important expression model of topography, DEM has been applied to the dynamic simulation of ground surface, multi-scale modeling in geoscience, and other related fields. The DEM interpolation method makes a prediction of the values of a primary variable at scattered data within the same sampled area [70,71], which is an important way to build a spatially continuous surface by DEM data.
At present, most common solutions for spatial interpolation focus on geostatistical methods. In 1951, D.G. Krige [72] presented a linear least squares regression interpolation method, the Kriging interpolation method. In the 1960s, Birkhoff and Garabdian [73] proposed the two-dimensional spline function, and De Boor [74] introduced bicubic spline interpolation. Spline interpolation could better fit the overall trend of the surface but ignored the detailed information of the terrain. In 1964, Bengtsson and Nordbeck [75] proposed a vector-based method, triangulated irregular network (TIN). TIN can reduce the redundancy of flat areas and can represent topographic linear features well. The research on TIN has not stopped [76][77][78][79]. However, TINs ignore non-linear information and are unable to represent spatial terrain types, such as cliffs, caves or holes. In 1968, Shepard [80] proposed the inverse distance weighting (IDW) method. IDW is simple and fast to calculate, but it fails to incorporate the spatial structure and ignores information beyond the neighborhood.
Among these common interpolation methods, the Kriging suite has been proven to perform better in most of the DEM modeling with high geometric accuracy and has become the most common method used in spatial interpolation. However, the goals of Kriging are practically unattainable since the mean error and the variance of the errors are always unknown [71]. Meanwhile, we use the Kriging algorithm programmed in MATLAB software to deal with the 1000 × 1000 DEM array on the same computational environment described in Section 4. The program exceeds the maximum array size preference in the current computational environment. This shows that compared with our method, the Kriging method is more computationally intensive.
Entering the 21st century, researchers also explored other theories to interpolate the DEM. Based on the fractal geometry theory, Paramanathan P. and Uthayakumar R. [81], Jiang P., et al. [82] and Liu L. and Wang X., et al. [83] have developed the fractal interpolation method, which eliminates smoothing problems. However, it could be only used in regular grid data. Based on the surface theory and differential geometry theory, Yue [71] presents the high accuracy surface modeling (HASM) method to simulate the Earth's surface system, which has been proven to perform well in eco-environmental surface modeling. Based on the deep learning theory, the "super-resolution" theory is commonly used in the image processing field. When it comes to DEM construction, our RSPCN method presents a deep-learning-based solution for spatial interpolation with higher accuracy than that of the traditional linear interpolation methods. In order to express the comparison between our method and the traditional method more intuitively, we have carried out a pixel-level comparison, the result of which is attached as Supplementary Material. Moreover, the paper provides a whole set of data processing steps from the preprocessing of raw terrain datasets to the final output. The simple codes make our method well-suited for various data formats.

Conclusions
In this paper, we introduce and improve the ESPCN method. Then, based on the recursion theory, we propose the recursive sub-pixel convolutional neural networks (RSPCN) method. The experimental datasets are six archetypal landform datasets derived from 3DEP LiDAR sources, which have typical terrain characteristics, including an eroded plateau, a volcanic caldera, active sand dunes, a braided riverbed, a shield caldera, and stabilized sand dunes. Except for the shield caldera, the results of the SR by RSPCN are better than the other traditional linear interpolation methods.
Our model could predict the terrain information of unmeasured points and be close to the real terrain information with relatively high fidelity to the real terrain structures. Compared with the traditional linear interpolation methods, the RSPCN method could effectively learn the probability distribution characteristics of the terrain, which could better balance the local characteristics and the overall characteristics. After extensive experiments, except for the shield caldera terrain, our method performs better than bilinear interpolation, bicubic interpolation, and nearest neighbor interpolation. Compared with the classic image super-resolution method based on deep learning, SRCNN, the RSPCN method directly processes on the sub-pixel level, decreasing the amount of computation. Compared with the ESPCN method, the RSPCN method introduces recursive ideas to shorten the depth of the network.
Meanwhile, we present a data preprocessing method by adding "zero" to the boundary of input DEM data, which improves the RSPCN method's accuracy and convergence. We believe that this adding "zero" method provides an available data preprocessing method in the CNN-based interpolation field. Further addressing and analyzing the effects of the various data preprocessing methods is another focus of our future work. Finally, as an alternative to other deep learning methods, the RSPCN method further explores the application of deep learning models in the field of DEM modeling.