Application of Multi-Channel Convolutional Neural Network to Improve DEM Data in Urban Cities

A digital elevation model (DEM) represents the topographic surface of the Earth and is an indispensable source of data in many applications, such as flood modeling, infrastructure design and land management. DEM data at high spatial resolution and high accuracy of elevation data are not only costly and time-consuming to acquire but also often confidential. In this paper, we explore a cost-effective approach to derive good quality DEM data by applying a multi-channel convolutional neural network (CNN) to enhance free resources of available DEM data. Shuttle Radar Topography Mission (SRTM) data, multi-spectral imaging Sentinel-2, as well as Google satellite imagery were used as inputs to the CNN model. The CNN model was first trained using high-quality reference DEM data in a dense urban city—Nice, France—then validated on another site in Nice and finally tested in the Orchard Road area (Singapore), which is also an equally dense urban area in Singapore. The CNN model not only shows an impressive reduction in the root mean square error (RMSE) of 50% at validation site in Nice and 30% at the test site in Singapore, but also results in much clearer profiles of the land surface than input SRTM data. A comparison between CNN performance and that of an earlier conducted study using artificial neural networks (ANN) was conducted as well. The comparison within this limited study shows that CNN yields a more accurate DEM.


Introduction
The digital elevation model (DEM) is actually a grid of topographic data. DEM represents the elevations of various grid cells (pixels) in a given area [1], without further definition about the surface. DEM is often used as a generic term for the digital surface model (DSM) and digital terrain model (DTM). DTM represents the bare ground surface, while DSM includes all objects on the ground. DEM data can be obtained by ground surveying or by remote-sensing methods, including stereo photogrammetry, interferometric synthetic aperture radar (InSAR) interferometry and light detection and ranging (LiDAR) [1]. DEM data are an important input for many applications, such as: ecology and glaciology modeling [2,3], hydrologic and flood simulations [4][5][6][7][8][9][10][11][12][13][14], and engineering infrastructure modeling [15].
In this paper, we define reference (surveyed) DEM data as the DEM data with highspatial resolution and high accuracy (i.e., low vertical errors). Reference DEM data are the best sources of data for applications in several studies. However, acquiring reference DEM (or high-quality DEM) is not only time consuming but also comes at an expensive cost [16,17]. Challenges are even higher in developing countries with limited project funding. Moreover, the access to such high-quality DEM data is often limited due to confidentiality [18].
Recently, many space-borne sources of DEM data on an almost global scale became publicly accessible, such as Shuttle Radar Topography Mission (SRTM) [19], Advanced processing to detect the important features from input images [38][39][40][41]. We could take advantage of CNN by implementing a special treatment for input data to create multichannels input images. In this work, we implemented the multi-channel CNN to improve SRTM using Sentinel-2 multispectral imagery, Google satellite imagery and OpenStreetMap Buildings. High-resolution surveyed DEMs in Nice (France) and Singapore were used to train, validate and test the performance of the CNN model. The objective of this study was to examine the added value of CNN in topography improvement over urban cities (Nice and Singapore). Upon significant improvement in accuracy, this would be a valuable contribution in generating much-improved DEM, from satellite DEM, at sites in various countries where DEM data often are not available or confidential.
The paper is structured as follows. All available data are summarized in Section 2; the methodology of the scheme for the CNN model, including data pre-processing, the assessment method, and model configuration, is described in Section 3. Section 4 provides an analysis of the performances of the CNN model. Section 5 lists the key findings from this research work.

SRTM Data
The Shuttle Radar Topography Mission (SRTM) is an international project headed by the National Imagery and Mapping Agency (NIMA) of the United States Department of Defense (DoD) and the National Aeronautics Space Administration (NASA). SRTM is publicly accessible and generally considered the most suitable for flood modelling applications [42]. SRTM at 30 m spatial resolution has been available since 2015, and the absolute vertical error is less than 16m. However, there are known issues with SRTM, such as vertical offset errors, random noise, and vegetation/building biases. Moreover, due to coarse resolution, SRTM does not reflect precise surface characteristics, especially in dense urban areas. A sample of SRTM DEM is shown in Figure 1c.
the ANN model together with Sentinel-2 multispectral imagery for enhancing the SRTM DEM (30 m resolution) in dense urban cities. The results showed that the RMSE of the improved SRTM was reduced by about 25-35%, and the visibility of land shapes, buildings, and roads was significantly improved over the original SRTM. This paper is a continuation of the work undertaken by Kim et al. [5]. We used CNN instead of standard ANN presented in Kim et al. [5]. While ANN is dominant where datasets are limited and image inputs are not necessary, CNN is well known for image processing to detect the important features from input images [38][39][40][41]. We could take advantage of CNN by implementing a special treatment for input data to create multi-channels input images. In this work, we implemented the multi-channel CNN to improve SRTM using Sentinel-2 multispectral imagery, Google satellite imagery and Open-StreetMap Buildings. High-resolution surveyed DEMs in Nice (France) and Singapore were used to train, validate and test the performance of the CNN model. The objective of this study was to examine the added value of CNN in topography improvement over urban cities (Nice and Singapore). Upon significant improvement in accuracy, this would be a valuable contribution in generating much-improved DEM, from satellite DEM, at sites in various countries where DEM data often are not available or confidential.
The paper is structured as follows. All available data are summarized in Section 2; the methodology of the scheme for the CNN model, including data pre-processing, the assessment method, and model configuration, is described in Section 3. Section 4 provides an analysis of the performances of the CNN model. Section 5 lists the key findings from this research work.

SRTM Data
The Shuttle Radar Topography Mission (SRTM) is an international project headed by the National Imagery and Mapping Agency (NIMA) of the United States Department of Defense (DoD) and the National Aeronautics Space Administration (NASA). SRTM is publicly accessible and generally considered the most suitable for flood modelling applications [42]. SRTM at 30 m spatial resolution has been available since 2015, and the absolute vertical error is less than 16m. However, there are known issues with SRTM, such as vertical offset errors, random noise, and vegetation/building biases. Moreover, due to coarse resolution, SRTM does not reflect precise surface characteristics, especially in dense urban areas. A sample of SRTM DEM is shown in Figure 1c.

Ground Truth DEM
The ground truth DEM data (about 1 m spatial resolution and 40 cm vertical accuracy) in Nice were provided by Nice Côte d'Azur Metropolis, and those for Singapore were provided by Singapore's Building and Construction Authority. The reference data are measured from onboard light detection and ranging (LiDAR) equipment mounted on aircraft, and the accuracy levels are by design. These data were used for training, validation and testing the CNN. Both DEMs were collected in 2014. A sample of surveyed DEM data in Nice is shown in Figure 1b.
The ground truth DEM data were used as target data to train the CNN model. They were also used as observational data to evaluate the performance of the trained models.

Google Satellite Imagery
Google satellite imagery has resolution ranges from 15 m to 15 cm and displays the Earth's surface from a far distance. Google satellite images can be downloaded through SASPlanet (free application used to view and download satellite maps). The data consist of three RGB (red, green, blue) bands. The data can be used as inputs to the CNN model as well as for visual comparison. A sample of Google satellite imagery is shown in Figure 1a. The combination of three RGB bands results in an image that resembles the way our eyes see the world.

Sentinel-2 Multispectral Imagery
The Sentinel-2 data were developed by the ESA (European Space Agency) for monitoring variability in land surface conditions to support services such as forest monitoring, detection of land cover changes, and natural disaster management. Sentinel-2 consists of twin polar orbiting satellites under the same orbit, phased at 180 degrees to each other. The Sentinel-2 multispectral instrument (MSI) obtains the reflective wavelengths of multispectral observations with directional effects caused by the reflectance anisotropy of the surface. The MSI aims to measure the Earth's reflected radiance through the atmosphere using 13 spectral bands: from the visible and near-infrared (VNIR) through to the short-wave infrared (SWIR) [43][44][45]. The multispectral imagery is useful for land use classification, seasonal monitoring, and agricultural and environmental applications. Sentinel-2 data, with a 5-day revisit frequency, are also publicly accessible. Kim et al. [5] analyzed the reflectance of Sentinel-2 for varied land uses and found that the reflectance of short-wave infrared (SWIR) bands (bands 6-8) in forest areas was higher than that over urban areas and that the reflectance of near-infrared (NIR) bands (bands 2-5) in urban areas was higher than that over forest cover. In this study, this multispectral imagery was used to classify the different land covers, which had different error patterns.
Sentinel-2 contains an optical instrument payload that samples 13 spectral bands: four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution. For more details, refer to Kim et al. [5].
Sentinel-2 data were downloaded for two areas, Nice (France) and Singapore. The selection of Sentinel-2 imagery was based on low cloud cover, as more cloud presence gives rise to inaccuracies in the ground reflectance. The cloud filtering process in this paper involved screening of satellite imagery metadata and shortlisting only those with cloud presence of less than 10%. From these shortlisted tiles, visual screening for the least cloud presence over the study area was undertaken.
There were eight input features from Sentinel-2 used for the CNN model: bands 02-08 and band 8A. The selection was based on the high resolution (10-20 m) of spectral bands and their highlight information for vegetation and urban structures.

Building Footprint
OpenStreetMap (OSM) is a collaborative volunteered geographic information (VGI) project. It provides data that can be used in various ways, including the production of a digitized map accessible to the public at no cost [46]. A building footprint can be downloaded from OSM Buildings (http://osmbuildings.org) (accessed on 15 February 2021) in a vector data format that can be read in geographic information system (GIS) software.
The building footprint data will be used as input to the CNN model to enhance its performance over dense urban cities. The building footprint simply associates the grid cells with values of 1 for building and 0 for non-building cells.

Flowchart of the Methodology
The workflow of this study is summarized in Figure 2. Various remote sensing data at different spatial resolutions were collected and pre-processed. The processed data were then augmented (such as with rotation and reflection) to populate the input data. Finally, the data were fed into the CNN model to train in Nice, validate in Nice, and test in Singapore.
performance over dense urban cities. The building footprint simply associates the grid cells with values of 1 for building and 0 for non-building cells.

Flowchart of the Methodology
The workflow of this study is summarized in Figure 2. Various remote sensing data at different spatial resolutions were collected and pre-processed. The processed data were then augmented (such as with rotation and reflection) to populate the input data. Finally, the data were fed into the CNN model to train in Nice, validate in Nice, and test in Singapore.

Data Processing
With SRTM having a 30 m horizontal resolution, the reference DEM with 1 m, and Sentinel-2 with 10-60 m, all input and output layers were standardized to a 10 m resolution using the nearest neighbor approach [47]. Additionally, ground truth elevations were referenced in the Earth Gravitational Model 96 (EGM96) geoid heights [48], and therefore, all elevations were converted to geoid height.
All input data were processed and divided into training, validation, and testing datasets. The areas covered by these datasets are shown in Figure 3. The training dataset was over an area of 12 km 2 in Nice (the box with blue comb pattern in Figure 3a); the validation dataset was over an area of 5.2 km 2 (the box with red comb pattern in Figure 3a); and the testing dataset was an area of 2.6 × 4.8 km in the Orchard Road area of Singapore (shown in Figure 3b). The training and validation sites are mainly urbanized with buildings, and the elevation profiles vary from 0 m to 200 m. The average building height is 19.1 m (maximum 60.8 m), and buildings occupy 34% of the total area. Similarly, the test site in Orchard Road, Singapore is also a dense urban area with many high-rise buildings with elevations ranging from 0 m to 150 m. The average building height is 24.5 m (maximum 130 m), and buildings occupy 36% of the total area.

Data Processing
With SRTM having a 30 m horizontal resolution, the reference DEM with 1 m, and Sentinel-2 with 10-60 m, all input and output layers were standardized to a 10 m resolution using the nearest neighbor approach [47]. Additionally, ground truth elevations were referenced in the Earth Gravitational Model 96 (EGM96) geoid heights [48], and therefore, all elevations were converted to geoid height.
All input data were processed and divided into training, validation, and testing datasets. The areas covered by these datasets are shown in Figure 3. The training dataset was over an area of 12 km 2 in Nice (the box with blue comb pattern in Figure 3a); the validation dataset was over an area of 5.2 km 2 (the box with red comb pattern in Figure 3a); and the testing dataset was an area of 2.6 × 4.8 km in the Orchard Road area of Singapore (shown in Figure 3b). The training and validation sites are mainly urbanized with buildings, and the elevation profiles vary from 0 m to 200 m. The average building height is 19.1 m (maximum 60.8 m), and buildings occupy 34% of the total area. Similarly, the test site in Orchard Road, Singapore is also a dense urban area with many high-rise buildings with elevations ranging from 0 m to 150 m. The average building height is 24.5 m (maximum 130 m), and buildings occupy 36% of the total area.

CNN Configuration
The MATLAB Deep Learning Toolbox was applied in this study for DEM enhancement. We used U-Net structure to design the CNN model. In U-Net structure, the initial series of convolutional layers are interspersed with max pooling layers, successively decreasing the resolution of the input image. These layers are followed by a series of convolutional layers interspersed with up-sampling operators, successively increasing the resolution of the input image. Combining these two series paths forms a U-shaped graph. The CNN model was trained by multi-channel input data. Each input channel represented a feature at 10 m spatial resolution. The datasets mentioned above were processed and divided into training, validation, and testing sets.
CNNs are widely applied in deep learning. CNNs are capable of capturing the spatial and temporal dependencies in an image via application of appropriate convolutional filters. In other words, they learn not only from the input features available for a given pixel or grid cell, but also from those of its neighbors, accounting for potential spatial relationships in that neighborhood. The CNN was widely used in image classification and segmentation tasks, but limited cases of its application are in topography data improvement.
We developed our CNN model based on U-Net architecture, which was fundamentally applied for bio-medical imaging by O. Ronneberger et al. [38]. The architecture consisted of an encoder and a decoder. The encoder was to deal with covenant layers and extract the factors in an image. The decoder used transposed convolution to allow localization. It should be noted that the CNN model developed in this research does not include fully convolutional layers as commonly used in segmentation. Additional convolutional layers were added to allow the model to generate intact images (rather than single values or labels). The structure of the CNN model used in this paper is shown in Figure 4.  There are 13 input features that are used for the CNN model presented in this paper. The input features include: SRTM_DEM, 8 bands of Sentinel-2 multispectral imagery, 3 bands of RGB from Google satellite imagery, and building footprints from Open-StreetMap Buildings. The target data were high-resolution reference DEM data (or ground truth DEM). All the input features and target data were standardized to a 10 m resolution through the nearest neighbor sampling method.
We used the MATLAB built-in function randomPatchExtractionDatastore to extract randomly positioned patches in size 32 × 32 grid cells from 13 input features and target data. Moreover, to populate the training data, we also used MATLAB built-in function imageDataAugmenter to apply random reflection in the left-right direction and 90 degree rotation to the input features and target data.
The CNN model was trained, and hyper-parameter tuning (parameters are manually defined when the model is initialized) was performed to ensure the convergence and good There are 13 input features that are used for the CNN model presented in this paper. The input features include: SRTM_DEM, 8 bands of Sentinel-2 multispectral imagery, 3 bands of RGB from Google satellite imagery, and building footprints from OpenStreetMap Buildings. The target data were high-resolution reference DEM data (or ground truth DEM). All the input features and target data were standardized to a 10 m resolution through the nearest neighbor sampling method.
We used the MATLAB built-in function randomPatchExtractionDatastore to extract randomly positioned patches in size 32 × 32 grid cells from 13 input features and target data. Moreover, to populate the training data, we also used MATLAB built-in function imageDataAugmenter to apply random reflection in the left-right direction and 90 degree rotation to the input features and target data.
The CNN model was trained, and hyper-parameter tuning (parameters are manually defined when the model is initialized) was performed to ensure the convergence and good performance of the model. The optimal hyper-parameters after tuning are shown in Table 1. With the choice of the optimal hyper-parameters, the CNN model used 13 input features mentioned above and was trained against reference DEM data (target data). There are two different approaches for the CNN model considered in this study: iConvDEM-1 and iConvDEM-2. The flowchart of data usage for iConvDEM-1 and iConvDEM-2 is shown in Figure 5. In the first approach, iConvDEM-1, the CNN model was trained against target data with no special treatments for building and non-building features; they were all in one input dataset. The performance of the iConvDEM-1 model was promising, but the improvement for SRTM DEM was not much different from that of Kim et al. [5] using the ANN.
The second approach, iConvDEM-2, was developed based on the analysis of SRTM data versus reference DEM data. For the available data used in the scope of this work, SRTM data underestimated over building grid cells and overestimated over non-building grid cells. This implied that one CNN model may not perform well for both building and non-building areas. This was a significant finding in the application of the machine learning approach for improving SRTM data over dense urban areas. Kim et al. [5] reached a similar finding as well. In order to address this issue, we proposed a new CNN model (referred to henceforth as iConvDEM-2 in this paper). The iConvDEM-2 model consisted of two separate CNN training processes, one just with buildings (CNN_b) and the other In the first approach, iConvDEM-1, the CNN model was trained against target data with no special treatments for building and non-building features; they were all in one input dataset. The performance of the iConvDEM-1 model was promising, but the improvement for SRTM DEM was not much different from that of Kim et al. [5] using the ANN.
The second approach, iConvDEM-2, was developed based on the analysis of SRTM data versus reference DEM data. For the available data used in the scope of this work, SRTM data underestimated over building grid cells and overestimated over non-building grid cells. This implied that one CNN model may not perform well for both building and non-building areas. This was a significant finding in the application of the machine learning approach for improving SRTM data over dense urban areas. Kim et al. [5] reached a similar finding as well. In order to address this issue, we proposed a new CNN model (referred to henceforth as iConvDEM-2 in this paper). The iConvDEM-2 model consisted of two separate CNN training processes, one just with buildings (CNN_b) and the other without buildings (CNN_nb). The performance of iConvDEM-2 over dense urban areas, shown later in Section 4, is much better than that of iConvDEM-1.

Evaluation Methods
The DEM enhancement evaluation was undertaken through visual clarifying, scatterplots, and three statistical measures, the error (E), the absolute error (AE) and root mean square error (RMSE) [5,25,49].
The error (or bias) and absolute error calculate the magnitude of errors between the surveyed (yi) and simulated (Yi) elevations at all grid points (N) in the DEMs (Equation (1)).
The mean absolute error is the average of all grid points in the study area, which is calculated as in Equation (2).
RMSE is the square root of the average of squared differences between surveyed (yi) and simulated (Yi) elevations at all grid points (N) in the DEMs (Equation (3)). The RMSE is the standard way to compute the degree of accuracy between a set of estimates and the actual values [50].
In a set of estimates, both MAE and RMSE are applied to investigate the differences in any errors. If the values are close to zero, then the performance of the estimation output is considered good.

Preliminary Results
In this section, we evaluated the performance of different approaches for the validation site in Nice, France (within the box with blue comb pattern in Figure 3a). The two approaches using CNN were iConvDEM-1 and iConvDEM-2 (presented in Section 3.3), which were used to compare against the standard ANN presented in Kim et al. [5] (referred to as ANN [1] in this study).
We first introduced the comparison between the original SRTM DEM and reference DEM. The elevation maps of SRTM DEM and reference DEM data at the validation site in Nice are shown in Figures 6a and 6b, respectively. Figure 6c shows the absolute error between the reference DEM and SRTM DEM. The mean absolute error between reference DEM and SRTM DEM was about 6.8 m, and most of the grid cells with high value absolute error were located at the right side of the map where the area is mainly urbanized with buildings.
We then analyzed the accuracy of DEM generated by the proposed schemes. Figure 7 shows the spatial distribution for absolute error versus reference DEM of SRTM DEM, ANN [1], iConvDEM-1, and iConvDEM-2. Figure 7a shows the highest values of absolute error (majority of grid cells are in dark yellow color, corresponding with absolute error around 10-20 m). Figure 7b,c show more white and light-gray color grid cells (representing absolute error below 10 m) in comparison with the display in Figure 7a. This means that there was a significant improvement for SRTM data using the iConvDEM-1 and ANN [1] model. However, iConvDEM-2 showed the best performance of all by presenting a majority of white and light-gray color grid cells (representing absolute error below 10 m) in its absolute error plot against reference DEM data (Figure 7d).
We first introduced the comparison between the original SRTM DEM and reference DEM. The elevation maps of SRTM DEM and reference DEM data at the validation site in Nice are shown in Figure 6a and Figure 6b, respectively. Figure 6c shows the absolute error between the reference DEM and SRTM DEM. The mean absolute error between reference DEM and SRTM DEM was about 6.8 m, and most of the grid cells with high value absolute error were located at the right side of the map where the area is mainly urbanized with buildings. We then analyzed the accuracy of DEM generated by the proposed schemes. Figure  7 shows the spatial distribution for absolute error versus reference DEM of SRTM DEM, The numerical values of absolute errors and root mean square error (the lower, the better) for all approaches versus reference DEM are shown in Table 2. We can see that iConvDEM-2 outperformed other models with the lowest values of error in all categories. This indicates that the DEM generated by iConvDEM-2 was in best agreement with the reference DEM data.  Figure 8 shows the cumulative distribution of absolute error (AE) values of all approaches versus reference DEM data. The x-axis represents values (in m) of AE while the y-axis shows accumulative distribution of AE values at all grid cells (from 0% to 100%). We see that only about 70% of grid cells of SRTM_DEM (black curve) had AE values lower than 10 m. ANN [1] (blue curve) and iConvDEM-1 (pink curve) showed promising improvement with around 90% of grid cells having AE values lower than 10 m and about 70% of grid cells having AE values lower than 5 m. iConvDEM-2 showed the best performance with 95% of grid cells having AE values lower than 10 m and about 80% of grid cells having AE values lower than 5 m.
As iConvDEM-2 outperforms other approaches, the remaining Sections 4.2 and 4.3 assess the performance of iConvDEM-2 when applied to a validation site in Nice and a test site in Singapore. error (majority of grid cells are in dark yellow color, corresponding with absolute error around 10-20 m). Figure 7b,c show more white and light-gray color grid cells (representing absolute error below 10 m) in comparison with the display in Figure 7a. This means that there was a significant improvement for SRTM data using the iConvDEM-1 and ANN [1] model. However, iConvDEM-2 showed the best performance of all by presenting a majority of white and light-gray color grid cells (representing absolute error below 10 m) in its absolute error plot against reference DEM data (Figure 7d). The numerical values of absolute errors and root mean square error (the lower, the better) for all approaches versus reference DEM are shown in Table 2. We can see that iConvDEM-2 outperformed other models with the lowest values of error in all categories.

Validation of iConvDEM-2 in Nice, France
The comparisons of elevation maps among SRTM DEM, reference DEM and iConvDEM-2 are shown in Figure A1 (in Appendix A). The elevation map generated by iConvDEM-2 showed much clearer land shapes of buildings and roads when compared with SRTM DEM. Moreover, the RMSE (versus reference DEM) of iConvDEM-2 was reduced significantly, to 4.8 m from 9.2 m of SRTM DEM (a 48% reduction).
For better visualization, we selected a zoomed-in site (or sub-area) covering a 1.4 km × 1.4 km area at the center of the validation site in Nice, France. The sub-area is seen bounded by the red box inside satellite imagery in Figure A1. The comparison of elevation maps between SRTM DEM, reference DEM and iConvDEM-2 over the sub-area is shown in Figure 9.
We see that only about 70% of grid cells of SRTM_DEM (black curve) had AE values lower than 10 m. ANN [1] (blue curve) and iConvDEM-1 (pink curve) showed promising improvement with around 90% of grid cells having AE values lower than 10 m and about 70% of grid cells having AE values lower than 5 m. iConvDEM-2 showed the best performance with 95% of grid cells having AE values lower than 10 m and about 80% of grid cells having AE values lower than 5 m.   We computed the bias against the reference DEM to evaluate the quality of SRTM and iConvDEM. The bias maps in Figure 9e were constructed by simply calculating the differences (errors) between reference DEM and SRTM DEM. Similarly, Figure 9f shows the differences between reference DEM and iConvDEM-2. We could see that iConvDEM-2 showed far less bias (error) than SRTM DEM. In Figure 9f, the bias values are mostly within −10 m to 10 m. The light blue color implies overestimation of elevation by iConvDEM-2 over non-building land areas, and the light yellow color implies underestimation of elevation by iConvDEM-2 over building areas. We also observed a similar behavior (overestimation over non-building land areas and underestimation over building areas) with SRTM DEM. However, the light red color and dark blue color in Figure 9e demonstrate much higher bias with SRTM DEM (within −30 m to 30 m). The elevation data improvement with iConvDEM-2 over SRTM DEM is further demonstrated in Figure 10.   Figure 10e, a single peak close to zero shows that iConvDEM-2 is very close to reference DEM. Moreover, the large bias of SRTM DEM is demonstrated in Figure 10c, where the cumulative distribution function of absolute error between reference DEM and SRTM DEM is shown. Only 20% of the data points in SRTM DEM had absolute error of less than 5 m, while another 60% of data points had absolute error within the range 5-15 m. The rest of the data points (20%) were at more than 15 m in absolute error. On the other hand, more than 80% of the data points in iConvDEM-2 had absolute error of less than 5 m, which is shown in Figure 10f.

Validation of iConvDEM-2 in Nice, France
The comparisons of elevation maps among SRTM DEM, reference DEM and iCon-vDEM-2 are shown in Figure A1 (in Appendix A). The elevation map generated by iCon-vDEM-2 showed much clearer land shapes of buildings and roads when compared with SRTM DEM. Moreover, the RMSE (versus reference DEM) of iConvDEM-2 was reduced significantly, to 4.8 m from 9.2 m of SRTM DEM (a 48% reduction).
For better visualization, we selected a zoomed-in site (or sub-area) covering a 1.4 km × 1.4 km area at the center of the validation site in Nice, France. The sub-area is seen bounded by the red box inside satellite imagery in Figure A1. The comparison of elevation maps between SRTM DEM, reference DEM and iConvDEM-2 over the sub-area is shown in Figure 9.  within −10 m to 10 m. The light blue color implies overestimation of elevation by iCon-vDEM-2 over non-building land areas, and the light yellow color implies underestimation of elevation by iConvDEM-2 over building areas. We also observed a similar behavior (overestimation over non-building land areas and underestimation over building areas) with SRTM DEM. However, the light red color and dark blue color in Figure 9e demonstrate much higher bias with SRTM DEM (within −30 m to 30 m). The elevation data improvement with iConvDEM-2 over SRTM DEM is further demonstrated in Figure 10.

Testing of iConvDEM-2 in Orchard Road Area, Singapore
To test the performance of iConvDEM-2, which was trained and validated in Nice, we selected a dense urban area as the test site. The selected test site was in a dense urban Orchard Road area in Singapore (Figure 3b). The quality of input SRTM DEM data and output data generated by iConvDEM-2 was compared against that of a reference DEM provided by the Singapore's Building and Construction Authority (BCA). The satellite image of the test site together with elevation maps of SRTM DEM, reference DEM and iConvDEM-2 are shown in Figure A2 (in Appendix A). The elevation generated by iConvDEM-2 matched the reference DEM more than the SRTM DEM. The RMSE (versus reference DEM) of iConvDEM-2 was reduced significantly, to 12.8 m from 18.5 m of SRTM DEM (a 30.8% reduction).
Similarly to the validation process, we selected a zoomed-in site (or sub-area) covering a 1.4 × 1.4 km area within the test site in the Orchard Road area for better visualization. The sub-area is seen bounded by the red box inside the satellite imagery in Figure A2a. The comparison of elevation maps between SRTM DEM, reference DEM and iConvDEM-2 over the sub-area is shown in Figure 11. Figure 11a is a satellite image of the test area delineating the land shapes; Figure 11b-d are the elevation maps of SRTM DEM, reference DEM and iConvDEM-2, respectively. The iConvDEM-2 again shows clearer shapes of buildings and roads than does the original SRTM DEM. In addition, building heights generated by iConvDEM-2 matched well with the reference DEM and clearly showed better quality than SRTM DEM. Significant improvement of iConvDEM-2 is also reflected in the analysis in Figure 12.
Similarly to the validation process, we selected a zoomed-in site (or sub-area) covering a 1.4 × 1.4 km area within the test site in the Orchard Road area for better visualization. The sub-area is seen bounded by the red box inside the satellite imagery in Figure A2a. The comparison of elevation maps between SRTM DEM, reference DEM and iConvDEM-2 over the sub-area is shown in Figure 11.   Figure 12b,e show the frequency error distribution of SRTM DEM and iConvDEM-2. In Figure 12b, the peak is around −15 m, implying that SRTM DEM shows overestimation, and the majority of the data points (pixels) of SRTM DEM have an error of 15 m in comparison with those of the reference DEM. In Figure 12e, a single peak close to zero shows that iConvDEM-2 agreed quite well with reference DEM. Moreover, the large bias of SRTM DEM is demonstrated in Figure 12c, where the cumulative distribution function of absolute error between reference DEM and SRTM DEM is shown. Only 40% of the data points of SRTM DEM had an absolute error of less than 10 m, while 20% of the data points had an absolute error higher than 20 m. On the other hand, more than 80% of the data points of iConvDEM-2 had an absolute error of less than 10 m, which is shown in Figure 12f. However, the fact that about 10% of the data points had absolute error higher than 20 m implies some weakness in the application of iConvDEM-2 for the test site in the Orchard Road area of Singapore. This can be attributed to the complexity of building profiles in the Orchard Road area. There are many high buildings in the Orchard Road area, and the maximum height of the buildings is more than 100 m (compared with the Nice area used to train iConvDEM-2, with less building density and a maximum building height of 60.8 m). Figure 11a is a satellite image of the test area delineating the land shapes; Figure 11bd are the elevation maps of SRTM DEM, reference DEM and iConvDEM-2, respectively. The iConvDEM-2 again shows clearer shapes of buildings and roads than does the original SRTM DEM. In addition, building heights generated by iConvDEM-2 matched well with the reference DEM and clearly showed better quality than SRTM DEM. Significant improvement of iConvDEM-2 is also reflected in the analysis in Figure 12.   Figure 12b,e show the frequency error distribution of SRTM DEM and iConvDEM-2. In Figure 12b, the peak is around −15 m, implying that SRTM DEM shows overestimation, and the majority of the data points (pixels) of SRTM DEM have an error of 15 m in comparison with those of the reference DEM. In Figure 12e, a single peak close to zero shows that iConvDEM-2 agreed quite well with reference DEM. Moreover, the large bias of SRTM DEM is demonstrated in Figure 12c, where the cumulative distribution function of absolute error between reference DEM and SRTM DEM is shown. Only 40% of the data points of SRTM DEM had an absolute error of less than 10 m, while 20% of the data points had an absolute error higher than 20 m. On the other hand, more than 80% of the data points of iConvDEM-2 had an

Application of iConvDEM-2 in Other Areas with AW3D Input Data
In this section, we would like to show the applicability of the proposed CNN method to an urban area in Jakarta, Indonesia. The input DEM was an AW3D at 2.5 m spatial resolution. Other input channels (8 bands of Sentinel-2 multispectral imagery, 3 bands RGB from Google satellite imagery and building footprint from OpenStreetMap Buildings) were also prepared for the application site. Figure 13a is a satellite image of the validation area showing the land shapes of the dense urban area in Jakarta, Indonesia; Figure 13b,c are the elevation maps of SRTM DEM and iConvDEM-2, respectively. Even though the reference DEM was not available for further evaluation, we could see that the elevation map generated by iConvDEM-2 definitely showed much clearer land shapes of buildings and roads when compared with the original AW3D. Figure 13a is a satellite image of the validation area showing the land shapes of the dense urban area in Jakarta, Indonesia; Figure 13b,c are the elevation maps of SRTM DEM and iConvDEM-2, respectively. Even though the reference DEM was not available for further evaluation, we could see that the elevation map generated by iConvDEM-2 definitely showed much clearer land shapes of buildings and roads when compared with the original AW3D.

Conclusions
This paper presented the use of a CNN model to improve SRTM DEM in dense urban cities with different treatments for built and non-built features. The CNN model used a U-Net structure configuration with SRTM DEM, Sentinel 2 multispectral imagery and Google imagery as input channels, while a high-resolution reference DEM was used as target data. To better address the high percentage of buildings within urban cities, the iConvDEM-2 model was introduced with two training processes, one with and one without buildings. By doing so, iConvDEM-2 outperformed the single model that handled both building and non-building features (iConvDEM-1). Moreover, the iConvDEM-2 model also showed better performance when compared to the work using an ANN, shown in Kim et al. [5].
The iConvDEM-2 model was trained in Nice, France and validated at a different site in Nice. The performance of iConvDEM-2 showed significantly better results than that of SRTM DEM. At the validation site in Nice, the RMSE reduction of iConvDEM-2 was about 50% when compared with SRTM DEM, and the visibility (land shapes, buildings, and roads) of iConvDEM-2 was much clearer than that of SRTM DEM. Most of the absolute errors (versus reference DEM) from iConvDEM-2 were below 5 m. This was a significant improvement because the original SRTM DEM had a majority of its absolute error values above 10 m.
The iConvDEM-2 was shown in testing to perform very well when applied to a faraway location (Orchard Road area in Singapore). The RMSE reduction of iConvDEM-2 remained quite impressive at 30%, and its visibility (land shapes, buildings, and roads) was far clearer than that of the original SRTM DEM. Over Singapore's Orchard Road area, a majority (80%) of the absolute errors (versus reference DEM) of iConvDEM-2 were below 10 m (while the original SRTM DEM had only 40% of absolute errors that fell below 10 m).
The iConvDEM-2 model presented in this paper was proven to enhance the quality of SRTM DEM. In addition, the CNN approach proposed in this study can also be applied to different input DEMs at any spatial resolution, such as AW3D [21] and TanDEM-X [51]. Moreover, the trained CNN model can be applied to any site that has an urbanization profile similar to that of the training site.
Generally, the method can be implemented to improve any satellite DEM data and in any urban city. The work presented in this paper effectively and efficiently obtained a good-quality, high-resolution DEM at low cost. The results of this work have promising potential to be applied over many urban cities, especially in developing countries where high quality DEM data are usually not available or very costly. Good-quality DEMs at high spatial resolutions generated, for example, by the proposed DEM improvement scheme with the CNN model are an indispensable input parameter in flood simulations to assess the impacts of changing climate and sea level rise.