Vertical Accuracy of Freely Available Global Digital Elevation Models (ASTER, AW3D30, MERIT, TanDEM-X, SRTM, and NASADEM)

: Freely available global digital elevation models (DEMs) are important inputs for many research ﬁelds and applications. During the last decade, several global DEMs have been released based on satellite data. ASTER and SRTM are the most widely used DEMs, but the more recently released, AW3D30, TanDEM-X and MERIT, are being increasingly used. Many researchers have studied the quality of these DEM products in recent years. However, there has been no comprehensive and systematic evaluation of their quality over areas with variable topography and land cover conditions. To provide this comparison, we examined the accuracy of six freely available global DEMs (ASTER, AW3D30, MERIT, TanDEM-X, SRTM, and NASADEM) in four geographic regions with di ﬀ erent topographic and land use conditions. We used local high-precision elevation models (Light Detection and Ranging (LiDAR), Pleiades-1A) as reference models and all global models were resampled to reference model resolution (1m). In total, 608 million 1x1 m pixels were analyzed. To estimate the accuracy, we generated error rasters by subtracting each reference model from the corresponding global DEM and calculated descriptive statistics for this di ﬀ erence (e.g., median, mean, root-mean-square error (RMSE)). We also assessed the vertical accuracy as a function of the slope, slope aspect, and land cover. We found that slope had the strongest e ﬀ ect on DEM accuracy, with no relationship for slope aspect. The AW3D30 was the most robust and had the most stable performance in most of the tests and is therefore the best choice for an analysis of multiple geographic regions. SRTM and NASADEM also performed well where available, whereas NASADEM, as a successor of SRTM, showed only slight improvement in comparison to SRTM. MERIT and TanDEM-X also performed well despite their lower spatial resolution.


Introduction
Digital elevation models (DEMs) and the topographic and terrain variables derived from these models (e.g., slope aspect, slope, channels, hillslopes) are fundamental inputs for environmental and landscape modeling and spatial analysis [1][2][3][4][5], especially for research in geomorphology [6], hydrology [7], and geology [8]. For several decades, DEM products have been created at different spatial scales (e.g., global or national) mainly using remote sensing data [9]. As different remote sensing data and methods are employed during the development of DEMs, it has become increasingly important to assess the accuracy and performance of such products to provide insights into their limitations [10] and develop correction procedures [11,12]. The potential artifacts (errors) and inconsistencies identified in DEM products usually arise from distortions caused by atmospheric, terrain, and sensor conditions at the moment of the data acquisition [10,13,14].
Due to the potential analytical bias that results from artifacts and inconsistencies in DEMs, several studies have investigated the quality of such products, especially for those developed at a global scale. These investigations usually focused on the height (vertical) accuracy of the DEM [15,16] or the model's performance in different applications, such as river network generation, the acquisition of geomorphological metrics, and flood simulation [12,17]. Most of the assessments compare freely available DEM products with GPS survey points [18] or with DEMs developed based on photogrammetric aerial images or on laser scanning (e.g., LiDAR) [19]. For instance, the vertical accuracy of the Shuttle Radar Topography Mission (SRTM; released in 2003) ranges from 2.18 to 21.70 m [10,20] and for the Advanced Spaceborne Thermal Emission and Reflection satellite (ASTER, released in 2006), the accuracy ranges from 4.56 to 7.10 m [10,20]. These high-resolution global DEM products, developed based on radar data (SRTM) and data from optical sensors (ASTER), were one of the first freely available DEMs at a global scale. Recently, new freely available global-scale DEM products have been released, such as the TanDEM-X DEM [15], the Multi-Error-Removed Improved-Terrain (MERIT) DEM [21], the ALOS Global Digital Surface Model (AW3D30) [22], and the NASADEM [23]. Because these products are so new (TanDEM-X has been freely available since late 2018, MERIT 2017, AW3D30 and NASADEM since the beginning of 2020), fewer studies have evaluated their accuracy [24][25][26][27][28]. If their accuracy can be confirmed, these new DEM products, which are based on more current remote sensing data and better processing methods, will be highly valuable because of their ability to capture natural or man-made changes in the topography of terrestrial surfaces [29].
DEM products are continuously being improved, and new versions are being released constantly. Usually, the newest versions are developed by applying analytical improvements that reduce errors or inconsistencies identified in the previous DEM versions, especially based on assessment studies [9,30]. Many researchers have assessed the quality of the global-scale DEM products in recent years [26,28,30]. However, there has been no comprehensive and systematic evaluation of the quality of these freely available global DEMs over areas with variable topography and land cover conditions. The accuracy of the DEMs can largely vary depending on the latitude, vegetation, and topography. In addition, previous studies have only used a few hundred GPS points for ground control [30,31] or LiDAR-based DEM averaged to global DEM resolution [26] which results in a loss of information. In our study, we evaluated the performance of the freely available DEM products (ASTER, AW3D30, MERIT, TanDEM-X, SRTM, and NASADEM) in four different regions of the world with different topography and land cover characteristics. We used locally derived LiDAR-based DEMs and the Pleiades 1A data as reference models to estimate the accuracy and resampled all the global models to the reference model resolution (1 m) which resulted in a total of 608 million 1x1 m pixels. This enabled us to perform the accuracy assessment at the highest possible resolution (1 m) and capture the continuous nature of topography. We estimated the vertical accuracy of the global DEM products as a function of slope, slope aspect, and land cover type. To the best of our knowledge, the accuracy of NASADEM has not been systematically estimated before against ground truth and in comparison with SRTM.

Study Areas
Four study areas were chosen based on two main criteria: (1) the availability of a LiDAR or other high-precision reference model, and (2) the diversity of the land cover and relief, as these are the main factors known to influence DEM accuracy. We chose study areas where a reference model was freely available and where different topographic conditions (flat, hilly, and mountainous areas) coexisted with various land cover types ( Figure 1).

Figure 1.
Locations of the four study areas, and images of their topography. The reference digital elevation models (DEMs) were obtained using local-scale Light Detection and Ranging (LiDAR) surveys (Estonia, Norway, New Zealand) and Pleiades 1A images (China).

Estonia
The Estonian study area is located in Southern Estonia, and covers 225 km 2 ( Figure 1). About 52% of the study area is covered by closed forest and about 17% is covered by open forest. About 28% of the study area is arable or otherwise cultivated land. The study area includes the small village of Puka, with 565 inhabitants, and several smaller villages. The elevation ranges from 40 to 218 m according to the LiDAR-based reference model (see Section 2.3.1.), and the topography is relatively flat, with some hilly landscapes in the eastern part of the study area ( Figure 1).

China
The Chinese study area is located in the Ningxia Hui autonomous region (south of Mongolia). The area is bounded by the Helan Mountains and has tectonic faults that resulted in a diverse relief ( Figure 1). In the 103 km 2 total area, 52% is covered by open land or sparse vegetation, 24% is covered by artificial landscapes (e.g., residential and industrial areas), and 20% by grasslands. The area's elevation ranges from 1063 to 1766 m ( Figure 1).

New Zealand
The study area covers 111 km 2 and is located in the southern part of North Island of New Zealand in the administrative districts of Lower Hutt City, Upper Hutt City, and South Wairarapa. The dominant land cover types are dense evergreen forests. The other land cover types are grasslands, open forest, shrubs, and arable land. The region's relief is characterized by mountains and gorges, with elevation ranging from 32 to 828 m.

Estonia
The Estonian study area is located in Southern Estonia, and covers 225 km 2 ( Figure 1). About 52% of the study area is covered by closed forest and about 17% is covered by open forest. About 28% of the study area is arable or otherwise cultivated land. The study area includes the small village of Puka, with 565 inhabitants, and several smaller villages. The elevation ranges from 40 to 218 m according to the LiDAR-based reference model (see Section 2.3.1.), and the topography is relatively flat, with some hilly landscapes in the eastern part of the study area ( Figure 1).

China
The Chinese study area is located in the Ningxia Hui autonomous region (south of Mongolia). The area is bounded by the Helan Mountains and has tectonic faults that resulted in a diverse relief ( Figure 1). In the 103 km 2 total area, 52% is covered by open land or sparse vegetation, 24% is covered by artificial landscapes (e.g., residential and industrial areas), and 20% by grasslands. The area's elevation ranges from 1063 to 1766 m ( Figure 1).

New Zealand
The study area covers 111 km 2 and is located in the southern part of North Island of New Zealand in the administrative districts of Lower Hutt City, Upper Hutt City, and South Wairarapa. The dominant land cover types are dense evergreen forests. The other land cover types are grasslands, open forest, shrubs, and arable land. The region's relief is characterized by mountains and gorges, with elevation ranging from 32 to 828 m.

Norway
The Norwegian study area lies between Jotunheimen and Rondane mountainous areas, and covers 193 km 2 . It is located in the Froni municipality within Innlandet County. The elevation ranges from 335 to 1662 m, but it is relatively flat and does not have many steep slopes. The Vinstra river crosses the study area. Approximately 44% of the study area is covered by forests, 37% by grasslands and 12% by open forest. Water bodies, arable land, and wetlands cover the rest of the area. The SRTM and NASADEM were not available for this study area because it is located above 60 • N latitude. Therefore, we only analyzed the ASTER, AW3D30, MERIT, and TanDEM-X DEMs to test the accuracy of these alternative global DEMs for a region where SRTM and NASADEM data are not available. Table 1 summarizes the characteristics of the DEMs used in the present study.

ASTER
The ASTER Global DEM (GDEM) version 2 (v2) was created photogrammetrically from a compilation of cloud-free ASTER stereopair images [32,33]. The sensor is carried aboard the Terra satellite and the images stereopairs were acquired by nadir-and after-looking angles in the near-infrared band [30,34]. A major advantage of the along-track mode of data acquisition (as compared to cross-track acquisition) is that the images that form the stereopairs are acquired a few seconds apart (rather than days) under uniform environmental and lighting conditions [34]. The ASTER GDEM2 was released in 2011 and showed improvements compared with the first version, such as better georeferencing, inclusion of more scenes acquired between 2008 and 2011, and a smaller correlation kernel, and has higher spatial resolution [33]. However, ASTER GDEM2 is known to have anomalies and artifacts that limit its usefulness [9,35]. Previous studies have found that ASTER GDEM2 performs well over bare areas but is less accurate over areas with a high forest cover [30,36].

AW3D30
The ALOSWorld3D 30 m DEM (AW3D30; version 3.1) was developed based on millions of images acquired by panchromatic optical sensor (PRISM) onboard the Advanced Land Observing Satellite (ALOS) [22]. The stereoscopic images were acquired along-track at nadir, backward and forward views and with 2.5 m of spatial resolution [37]. The first version of AW3D30 was published in 2016 and since then the dataset has been updated to improve the absolute/relative height accuracies with additional calibrations and void filling [22]. The latest version was released in April 2020, which was also used in the current study. Although, the accuracy estimation of the AW3D DEM is available only for the 5 m-resolution commercial version (root-mean-square error (RMSE) of 5 m), studies that have compared different DEM products have shown good accuracy for AW3D30 [38,39].

MERIT
The MERIT DEM was created to provide an error-reduced DEM product. The DEM was developed by combining data acquired from SRTM with data from the ALOS World 3D data with 30 m resolution (AW3D-30m v1), and used error-removal techniques that reduced the bias and other sources of noise [21]. The reported accuracy is ±12 m (90th percentile of the error range) [21]. Studies suggested that the reduced-error DEM performed well [12,28], but that artifacts remain in the DEM product [40].

TanDEM-X DEM
The TanDEM-X DEM was developed based on interferometric synthetic aperture radar (InSAR) images acquired by X-band radar and full polarimetric synthetic aperture radar sensors on two satellites (TerraSAR-X and TanDEM-X) [15,41]. The sensors imaged the Earth's surface at least twice between 2010 and 2015, and additional images were acquired for areas with complex topography, such as mountain regions. The sensors aboard the satellites orbited the Earth in a twin formation with long track distances ranging from 300 to 500 m, thereby supporting both across-track and along-track interferometry [42]. This approach supported the development of a DEM with high accuracy. The TanDEM-X DEM product with a spatial resolution of 12 m that was released in September 2016 is reported to have a horizontal and vertical absolute accuracy of <10 m, and a relative vertical accuracy of 2 m for slopes ≤11 • and 4 m for slopes >11 • [43]. As the DEM with 12 m resolution is only available upon approval of a scientific proposal and requires payment of a processing fee, we instead used the completely free TandDEM-X DEM global product (Version 1.0) with 90 m spatial resolution that was released in late 2018 [26]. The quality of the interferometric SAR (InSAR) DEMs are known to be affected by the particular land cover and terrain characteristics of the area under illumination and the specific acquisition geometry used for the data take [15]. SAR acquisitions over complex terrain are often affected by geometric distortions (including foreshortening, layover, and shadow) [15,44], and the X-band signal can partially penetrate vegetation [26] and volume scattering, creating some uncertainty in the height estimation.

SRTM
The SRTM DEM was developed based on the images acquired by two synthetic aperture radars aboard Space Shuttle Endeavour [45,46]. The shuttle flew and mapped continental areas between 60 • N and 60 • S for 11 days in February 2000. One of the sensors, which used a C-band radar system, was responsible for contiguous mapping coverage, while the other sensor, which used an X-band system, was responsible for generating data along discrete 50-km-wide swaths [45]. The SRTM DEM that we used, with a spatial resolution of 30 m, was developed based on the C-band radar interferometry data and has a reported accuracy of ±16 m [47]. However, the vertical accuracy has also been reported to be <9 m [48] and to be 4.31 m (±14.09 m) in mountain regions [49]. SRTM is known to suffer from voids due to shadow-casting [50]. The accuracy of SRTM is known to suffer from the similar InSAR methodology-related issues as TanDEM-X (see Section 2.2.4). In our study, we used the void-filled SRTM DEM version 3 (SRTMV3) dataset that was released in 2014 [50].

NASADEM
The NASADEM (released in February 2020) was created by reprocessing of the SRTM radar data, and merging it with other DEM datasets, such as ASTER, ICESat and GLAS. The main objective was to eliminate voids and other limitations that were present in the SRTM dataset [23]. The NASADEM is supposed to be the successor of the SRTM DEM dataset [51]. To the best of our knowledge, there are no studies on the accuracy of the NASADEM dataset.

LiDAR DEMs
We used DEM products derived based on Light Detection and Ranging (LiDAR) data as reference models for the study areas in Estonia, Norway, and New Zealand (China is described in the next section). LiDAR-based DEMs have good accuracy [57], but LiDAR technology cannot penetrate a dense vegetation canopy and the datasets do not cover large areas [32]. For the Estonian study area, we downloaded the DEM from the Estonian Land Board's Geoportal [58]. The reported vertical accuracy by the Estonian Land Board is 9 to 14 cm. The reference DEM for the Norwegian study area was acquired from Høydedata portal [59] and the reported vertical accuracy is 3 to 12 cm. The New Zealand reference DEM was downloaded from Linz Data Service [60]. The reported vertical accuracy of the New Zealand reference DEM is 10 to 15 cm. All reference DEMs had a 1 m horizontal resolution.

Pleiades-1A DEM
For the Chinese study area, we used the Pleiades-1A DEM because LiDAR data were not available. The Pleiades-1A DEM for the Chinese study area was developed using five image stereopairs acquired by the satellites of the Pleiades mission on 23 April, 17 July, 24 July, and 25 July of 2014 [61]. These images were processed using the photogrammetry module of the ERDAS IMAGINE software. The final DEM was acquired by averaging the point cloud from the image stereopairs within 1 m pixels using continuous-curvature splines in tension and a tension factor of 0.75 [61]. The data were downloaded from OpenTopography portal [62]. No estimated vertical accuracy was provided for the current DEM product, but studies that used similar approaches have reported accuracies from 30 cm [63] to 0.5 m [64].

Pre-processing
All the datasets had different spatial resolution and different horizontal and vertical reference systems. All global DEMs (except TanDEM-X) originally represented orthometric heights from the EGM96 geoid. However, all of the reference DEMs used different geoids. To make all the study areas comparable, we (1) projected all the datasets to the WGS84 coordinate system; and (2) transferred all DEMs to the EGM96 geoid by subtracting the difference between the local geoid and the EGM96 geoid from the reference DEM values. The elevation values for the TanDEM-X DEM represent ellipsoidal heights, so to convert them to orthometric heights, we subtracted the EMG96 height from the TanDEM-X DEM. All geoid models (except EST-GEOID2017) were downloaded in GeoTIFF format from Agisoft Geoids [65]. EST-GEOID2017 [66] was obtained in ASCII format from the Estonian Land Board. Previous studies have resampled reference models to the global DEM resolution, but this loses information. Therefore, we resampled all global DEMs using the nearest neighbor method and aligned the results spatially to the reference model's spatial resolution (1 m). Pre-processing was performed in ArcGIS 10.6 [67].

Accuracy Assessment
All global elevation models are, strictly speaking, digital surface models (DSMs) [30,32,68]. However, in most applications, they are treated as being equivalent to DEMs [69,70], so for simplicity, we will use the term "DEM" in the rest of this paper. To assess the vertical accuracy of the global DEMs, we created the error rasters on a cell-by-cell basis by subtracting the reference model's value from the corresponding value in each global DEM: where ∆h is the height difference (error) between the global and reference DEM, which was defined as the error term, Z model is the height in the global DEM, and Z ref is the height in the reference DEM. We used the map algebra module in ArcGIS 10.6 to generate the error rasters.
To analyze the effect of vegetation on the accuracy of the elevation models, we used the global 100 m-resolution vegetation cover data for the year 2015 from the Copernicus Global Land Cover Layers-Collection 2, which can be freely downloaded from the Copernicus site [71]. The data collection for the global DEMs took place from 2000 to 2017. We initially chose the study areas in relatively remote areas with small land cover change. In addition, to consider the land cover changes, we firstly performed visual check of historical changes based on Google Earth images and CCI long-term global land cover time series (300 m resolution). We identified that the land cover has not changed significantly over the last 10 years. To better align the land cover dataset for the year 2000 (SRTM and NASADEM), we used the Global Forest Change forest cover dataset [72] and added all the forests to the 2015 year land cover to ensure that bare areas are not overestimated in the land cover. Only the Estonian study area had noticeable changes in the forest cover from 2000 to 2015. The NZ and Norwegian study areas remained practically unchanged, and the Chinese study are had no forest. The original 21 land cover classes (Supplementary S1) were generalized and reclassified into 10 classes ( Table 2). To separate the effect of slope from the effect of land cover, we only used slopes of less than 15 • in the land cover analysis. Snow and ice 10 Water To analyze the effect of slope and aspect on the accuracy of the DEM, we derived slope and aspect from the reference model DEM (LiDAR or Pleiades-1A). Slope was reclassified into 8 classes (Table 2). Gdulová et al. [24] reported that the relationship between the DEM accuracy and terrain aspect is stronger for greater slopes. Therefore, we extracted slopes steeper than 15 • (slope classes 4 to 8 in Table 2) to estimate the effect of aspect on the DEM accuracy. In addition, to eliminate the effect of dense vegetation on the aspect analysis, we only used land cover classes 4, 5, and 7 (classes 9 and 10 were not present on slopes steeper than 15 • ) in the aspect analysis. We also reclassified aspect from 0 • to 360 • into 8 cardinal directions (Table 2).
In addition, we calculated roughness for all the study areas because we assumed that higher roughness (variability or irregularity in elevation) results in lower DEM accuracy. We performed correlation analysis which revealed that roughness was highly correlated with slope (0.92 < r2 < 0.98). Therefore, roughness would not have explained the significant amount of additional variation in the errors, and for this reason we did not include it in the further study.
We further analyzed the error rasters together with the land cover, slope, and aspect rasters using Python function libraries. We used the Python rasterio and pandas libraries [73] to transform all the raster data into dataframes in which all spatially aligned error cells were joined with the respective land cover, slope, and aspect class.
For every error raster, we generated a histogram of the errors (Supplementary S2). The histograms let us estimate the error distribution and detect any deviations from a normal distribution. We calculated error metrics across all study areas and also as a function of the land cover, slope, and aspect classes. We used the following error statistics suggested by Höhle and Höhle [74]: the mean error (ME), median error (MedE), root-mean-square error (RMSE), and standard deviation (STD). Several DEMs did not have a normal error distribution (Supplementary S1) and we therefore also calculated more robust accuracy measures suited for non-normal error distributions [74]: the normalized median absolute deviation (NMAD) and the 25% and 75% quantiles. We used the Python NumPy library [75] to calculate the error metrics. In addition, we generated boxplots to visualize the errors. The Python scripts used for these calculations are available in the GitHub repository: https: //github.com/LandscapeGeoinformatics/globalDEM_accuracy.

Overall Vertical Accuracy
For all study areas and elevation models, except for a few cases in ASTER and TanDEM-X, ME and MedE were positive (Table 3, Figure 2). That is, all the global models generally overestimated the elevation compared to the reference models, which was expected. From the spatial distribution of the errors, it is clear that forested areas had strong positive errors compared to the reference models (Figures 1 and 3). However, in the cases of the Estonian, New Zealand, and Norwegian study areas, ASTER had a negative ME and MedE. The negative errors were spatially occurring in areas with low vegetation cover (Figure 3). The accuracy of AW3D30 was the most stable based on RMSE and STD across the study areas and in most cases also performed the best; however, it slightly overestimated the elevation in all four study areas ( Figure 3). Of the six global models, ASTER and in most cases also MERIT performed the worst based on RMSE. Moreover, MERIT showed the largest variation in error values (NMAD between 2.11 and 10.41 m) (Table 3, Figure 2). Of the four study areas, the biggest variation in errors (STD) was observed in New Zealand, where the relief was the most rugged and was combined with tall and dense vegetation (Table 3, Figure 2). Table 3. Error measures for the ASTER, AW3D30, MERIT, TanDEM-X, SRTM, and NASADEM digital elevation models for the four study areas. SRTM and NASADEM data were not available for the Norwegian study area because of its high latitude. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quantiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error.

Effect of Slope and Aspect on Accuracy
There was a clear effect of slope on the vertical accuracy of the DEMs (Figure 4, Supplementary S3a-d) in the Chinese, New Zealand, and Norwegian study areas. The Estonian study area showed no slope effect because most of the study area is flat and 95% of the pixels had a slope less than 10 • (Supplementary S4). The effect of slope on accuracy was smallest for the AW3D30, SRTM, and NASADEM. The ASTER, MERIT, and TanDEM-X DEMs exhibited similarly strong effects of slope on their accuracy; with increasing slope, the RMSE and variation of the error (STD) increased (Figure 4, Supplementary S3a-d). MedE showed slightly different behavior among the DEMs. MedE did not have clear increasing nor decreasing pattern for ASTER. For all the other DEMs, MedE in general increased, except for the Chinese study area.
Aspect had a smaller effect on the accuracy of the DEMs than slope ( Figure 5). The effect of aspect on accuracy was strongest for the ASTER model; however, there was no clear pattern across study areas ( Figure 5). In the Norwegian study area, the E and SE slopes had MedE values of 6.04 and 3.24 m, respectively, whereas the opposing W and NW slopes had MedE values of −7.47 and −6.48 m, respectively. We observed a similar pattern for the New Zealand study area, where E, SE, and S slopes had contrasting MedE values to the opposite W, NW, and N slopes, respectively. However, the Estonian and Chinese study areas did not show similar patterns.
The MERIT model also showed high variation in errors (STD) for all aspects and all study areas except Estonia. However, there was no clear dependency on aspect. The errors in the AW3D30 model had the smallest variation but showed no strong dependency on aspect. In the case of New Zealand and Chinese study areas, AW3D30 even exhibited contrasting patterns. SRTM and NASADEM also showed relatively low variation. For the SRTM and NASADEM, the E and SE slopes had the poorest accuracy (high MedE) among the study areas and NW slopes had the highest accuracy (low MedE) ( Figure 5, Supplementary S5a-c). TanDEM-X did not show any dependency on aspect. Error rasters for the ASTER, MERIT, TanDEM-X, and SRTM digital elevation models for the four study areas. SRTM data were not available for the Norwegian study area because of its high latitude. We eliminated extreme values from the classification schema for symbology to ensure the comparability of the study areas.

Effect of Slope and Aspect on Accuracy
There was a clear effect of slope on the vertical accuracy of the DEMs (Figure 4, Supplementary S3a-d) in the Chinese, New Zealand, and Norwegian study areas. The Estonian study area showed Error rasters for the ASTER, MERIT, TanDEM-X, and SRTM digital elevation models for the four study areas. SRTM data were not available for the Norwegian study area because of its high latitude. We eliminated extreme values from the classification schema for symbology to ensure the comparability of the study areas.

Effect of Land Cover on Accuracy
We found that all DEMs had systematically higher elevations than the actual land surface in forested areas in land cover classes 1 and 2 ( Figure 6, Supplementary S6a-c). MedE was consistently positive for all the DEMs, except for the ASTER model in the Estonian study area, and RMSE ranged from 3.54 to 13.66 m for the closed forest and 2.82 to 10.5 m for the open forest across study areas. We observed the highest errors and variation (STD) for forested areas in the New Zealand study area. From all six global models, the AW3D30 was the most accurate over the forested areas across all study areas.
The DEMs were generally more accurate for the herbaceous and cultivated areas (land cover classes 4 and 5) than for the forested areas ( Figure 6, Supplementary S6a-d). In most cases, the global models slightly overestimated (positive MedE) the elevation over the herbaceous and cultivated areas. The RMSE was less than 5 m in most cases, except for the ASTER DEM, for which the RMSE was mostly greater than 5 m and often greater than 10 m. The ASTER DEM also often underestimated the elevation for herbaceous and cultivated areas. There was no single best performing DEM over the herbaceous and cultivated areas. Some DEMs that showed high accuracy in some study areas did not have good accuracy in some others (e.g., MERIT). However, AW3D30 showed most consistent and in most cases also the best accuracy across all study areas.
The magnitude of the error for the urban and built-up areas (land cover class 6) was similar to the values for herbaceous and cultivated areas. For urban and built-up areas, the MedE values were −5.56 to 4.39 m, which was similar to the values for herbaceous and cultivated areas. All DEMs, except for TanDEM-X, strongly overestimated the elevation over water (land cover class 10) in the Chinese study area (Supplementary S6b).

Overall Accuracy
AW3D30 provided the most robust and accurate DEM for all regions, which aligns with previous studies that found that the vertical RMSE was below 5m in flat areas [16,76] and rises up to 12 or to 14 m in areas with more complex relief [76].
The next-best DEMs were SRTM and NASADEM. For the SRTM, previous studies reported a vertical accuracy (RMSE) of 5 m [30] using GPS ground-control points, and 14.9 m (RMSE) by using LiDAR as the reference model [32]. Our results agree with these previous studies, as we found the overall RMSE to be 6.59, 10.00, and 13.07 m in the Estonian, Chinese, and New Zealand study areas, respectively. The recently released NASADEM, which is supposed to be the successor for the SRTM, showed only slightly better accuracy with the overall RMSE found to be 6.39, 8.53, and 12.08 m in the Estonian, Chinese, and New Zealand study areas, respectively. For the MERIT model, we found an overall accuracy (RMSE) of 3.01, 12.43, 13.58, and 10.49 m for the Estonian, Chinese, New Zealand, and Norwegian study areas, respectively. For the TanDEM-X DEM, overall RMSE was 7.16, 11.10, 15.05, and 6.84 m for the Estonian, Chinese, New Zealand, and Norwegian study areas, respectively. These fall within the expected accuracy range in previous research: 12 m for MERIT [21] and 10 m for TanDEM-X [15]. However, some studies have identified better accuracies. For example, Hawker et al. [26] estimated the RMSE to be 2.32 and 3.10 m for MERIT and TanDEM-X, respectively. Our error estimates are probably higher because of the more complex terrain in our study areas; however, in the flat Estonian study area, our results are similar to these previous results. MERIT and TanDEM-X are more sensitive than SRTM to relief, mainly due to the lower spatial resolution; this leads to intra-pixel variation, which is exacerbated on steep slopes.
ASTER performed the worst of the models in all study areas (RMSE 9.22 to 13.52 m). It also had the highest uncertainty (high STD) across study areas for a range of topographic conditions and land cover types. Several studies have found similar accuracy for ASTER [32,33]. The ASTER DEM also includes artifacts that are likely due to cloud cover, mismatches between different scenes, and processing techniques which is its major shortcoming [9,30].

Effect of Slope and Aspect on Accuracy
The slope had the strongest effect on the accuracy of the global DEMs. For all six global DEMs, the uncertainty increased with the increasing slope. The ASTER, MERIT and TanDEM-X were the most affected by slope. The RMSE rose from between 8. 25 S3a-d). This agrees well with previous studies [24,26,77,78]. We resampled all the global DEMs to agree with the reference model resolution (1 m), which results in higher intra-pixel variation on steeper slopes. This effect can be clearly seen in the error map for the New Zealand study area (Figure 3), which had the most complex topography and where intra-pixel variation is clearly visible for the lower-resolution MERIT and TanDEM-X DEMs. The accuracy of the AW3D30, SRTM and NASADEM models was the least affected by slope and their variation in error did not increase so much as compared to ASTER, MERIT and TanDEM-X.
ASTER showed the biggest variation in errors (STD) and AW3D30 showed the smallest variation due to aspect, but no clear dependency on aspect. The high variation of the errors and the overall random response to aspect is likely caused by artifacts in the ASTER model. The visual inspection of numerous artifacts in the ASTER model ( Figure 3) and the high variation in errors across the study areas demonstrated the model's lower quality, which was previously described by Purinton and Bookhagen [10].
For the SRTM and NASADEM, we observed a small systematic dependency on aspect in which opposite directions (e.g., NW and SE) had contrasting accuracy. This was observed previously by Szabó et al. [79] and Gorokhovich and Voustianiouk [80]. They related this dependency to the orbit of Space Shuttle Endeavour, which was perpendicular to the NE-SW line. The quality of synthetic aperture radar (SAR) images depends upon the view direction. In the view direction, inclined terrain causes foreshortening or even layover (i.e., when the beam encounters the top of an object before it encounters the base) [81]. Previous studies have identified similar issues for the TanDEM-X DEM. Gdulová et al. [24] observed the poorest accuracy for east-facing slopes; this resulted from the presence of areas with data acquisition only from an ascending orbit. However, our study did not identify this pattern. East-facing slopes only exhibited higher errors in Norwegian study area, and this was also observed for the MERIT and ASTER DEMs. The highest errors occurred on the steep banks of the Vinstra river valley, which is mostly oriented from NE to SW (Figure 3).

Effect of Land Cover on the Accuracy
In general, the magnitude of the error for the different land cover types differed more among study areas than among the different DEMs. This indicates that the land cover affects a DEM's accuracy in ways that are location-specific. Even for a given land cover type, the vegetation characteristics (density, height) vary enough among biomes that the variation causes significant variation in the errors of the DEMs. For SAR-based DEMs (TanDEM-X, SRTM, NASADEM), volume scattering causes uncertainty [79]. In the case of volume scattering in a forest, leaves reflect radar signals which then hit more leaves and branches, and multiple scattering occurs until the signals exit the vegetation. A percentage of the incoming radar is therefore volumetrically scattered back to the SAR sensor, giving the vegetation a brighter signature. However, if the forest height is homogeneous, then the forest surface will still mimic the underlying topography. Forested areas caused a positive bias (i.e., overestimation of the true elevation) in all global DEMs and all study areas. Elevation estimation in forested areas performed most poorly in the New Zealand study area, which is mostly covered by indigenous forests: Nothofagus spp. (beech), broadleaved angiosperm trees, Agathis australis (kauri), and other conifers (predominantly podocarps) form a tall and dense canopy [82]. This canopy may explain the larger positive bias in the New Zealand study area than in the other study areas.
Urban areas produced elevation estimates that were 2 to 5 m higher than the reference model, which was less than the error magnitude we expected. For example, González-Moradas and Viveen [30] found that buildings severely affected the performance of ASTER. As only the Estonian and Chinese study areas (Supplementary S4) had urban areas in flat areas and most of the urban areas were dominated by low buildings or roads, this explains the generally good performance of the global models in the urban parts of these study areas.
The ASTER, MERIT, and SRTM DEMs overestimated the elevation over bodies of water. For the SRTM and MERIT DEMs, this is likely caused by how SAR interferometry functions, since low coherence or low backscatter increases the noise over water [15]. Although SRTM v3 voids have been filled by using the ASTER DEM to improve the SRTM DEM's water mask [50], ASTER strongly overestimated elevations above bodies of water in the Chinese study area (Figure 3), because small bodies of water were not depicted in the ASTER water mask [83]. Therefore, the SRTM correction was not successful in the Chinese study area. Interestingly, TanDEM-X is also based on interferometry, but did not exhibit large errors over the bodies of water, probably because of having a higher quality water mask.
In principle, all the global DEMs we analyzed can be considered DSMs. This also explains their tendency to overestimate elevation over vegetated areas because the surface that reflects the signal (e.g., a tree canopy rather than the forest floor) exists at some height above the true ground level. However, in case of SAR-based elevation models, there exist some exceptions for areas where the SAR signal penetrates the surface by some meters-e.g., in cases of ice, snow or vegetation [84,85]. It has been shown that for ice and snow, up to 7 m underestimation can take place [86]. In case of forested areas, the overestimation will still happen because the forest height usually exceeds the penetration. However, for MERIT, an approximate vegetation correction has been applied for vegetation but with no comparable correction for buildings [70]. As a result, MERIT exhibited slightly better performance over densely forested areas than the other models.

Effect of Spatial Resolution
The ASTER, AW3D30, SRTM, and NASADEM had 30 m spatial resolution, but MERIT and TanDEM-X had only 90 m spatial resolution. Despite its higher resolution, ASTER performed worse than MERIT and TanDEM-X. However, in rugged terrain such as the New Zealand study area, we observed high variation of errors and a higher RMSE that can probably be explained by the low spatial resolution, which causes high error variation due to the DEM's inability to capture variation within the large pixels. Elevation can change significantly on steep slopes within a horizontal distance of 90 m. The elevation estimates around the center of the pixel are likely to be correct, but upslope values are likely to be underestimated and downslope values to be overestimated. This was most apparent for the New Zealand study area ( Figure 3). However, Hawker et al. [26] demonstrated that on floodplains, the 90 m TanDEM-X DEM performed better than the 30 m SRTM DEM and similarly to the 90 m MERIT DEM. For geomorphological mapping, high-resolution (12 m) TanDEM-X has been found to yield a detailed and widely consistent landscape representation, and most major landforms can also be easily allocated in the SRTM; however, ASTER does not allow identification of microscale landforms because of the high number of artifacts [87]. González-Moradas and Viveen [30] found that the 12 m resolution TanDEM-X DEM displayed the most accurate elevation values but that the 30 m SRTM DEM also performed well in terms of hydrological applications compared to other 30 m DEMs.

Limitations of Our Study
One of the limitations of our study was the land cover dataset. To ensure the consistency across all study areas, we used the Copernicus Global Land Cover dataset, with 100 m resolution, which is lower than the resolution of the analyzed DEMs. The coarser spatial resolution might have created errors in land cover classification at the edges of land cover types, where closed forest might have been classified as open forest and vice versa, causing incorrect attribution of errors to some of the land cover types. Another issue is the temporal change of land cover. The global DEM data were collected from 2000 to 2017 and land cover can change significantly. However, our study areas were in remote regions with minimal land cover change and for the year 2000, we used the high-resolution Global Forest Canopy Height dataset [88] to update the 2015 year land cover dataset to ensure that forested areas are the most accurate. In addition, seasonal differences (leaves on-off, dry-wet) affect the collection of the SAR based elevation data because these factors may either increase or decrease the backscatter [89]. Dry conditions usually increase the penetration. The SRTM data are collected in February when it is winter in the northern hemisphere and most of the trees have no leaves, which should increase the penetration. However, evergreen forests still have leaves all year round and are harder to penetrate and the timing of the wet and dry seasons vary across the world.
Several studies have attempted to develop methods to improve the accuracy of global DEMs by data fusion. For example, Yue et al. [90] combined SRTM-1, ASTER GDEM v2, and ICESat laser altimetry data by using ASTER GDEM v2 data as the elevation source for the SRTM void filling and the ICESat GLAS data to increase the accuracy of the ASTER data within the void regions by applying an artificial neural network model. Su et al. [91] developed a regression model based on tree height, canopy cover, and terrain slope to correct the SRTM DEM data in vegetated mountain areas. However, they noted that the limited availability of small-footprint LiDAR data may make it difficult to perform large-scale correction for the SRTM DEM. The recently released NASADEM, which is the updated version of SRTM, was improved by using the new unwrapping strategy [92] to reduce voids, and applying vertical and tilt adjustments based on ground control points derived from the ICESat laser altimeter to improve the elevation estimation over forested areas and snow/ice [23]. The NASADEM development team has also evaluated the extent to which ASTER GDEM3 with cloud removal can be used outside the SRTM latitude range to extend the NASADEM to polar areas [23]. However, based on our study and previous research, AW3D30 would be much more promising. Many studies covered large and geographically diverse regions, and the main reason for using freely available global DEMs is that no high-quality local DEM is available. Moreover, it is unlikely that these regions have high-quality field data over large regions that can be used to improve the global DEM. However, some new global-level datasets, such as Global Forest Change forest height dataset [88], would be promising to apply vegetation correction to NASADEM. Our results show that currently, the most accurate freely available DEM is AW3D30. However, higher-quality commercial products, for example, global TanDEM-X 12m-resolution DEM, have shown higher elevation accuracy [15,31].

Conclusions
The accuracy of the global DEMs depended greatly on the unique characteristics of each region. The ASTER model was the least accurate and had the highest uncertainty across all study areas, topographic conditions, and land cover types. AW3D30 showed the smallest uncertainty and highest accuracy across all study areas. However, NASADEM also showed very similar accuracy but it does not cover the northernmost latitudes. MERIT and TanDEM-X performed well despite their lower spatial resolution, and as they also cover higher latitudes. We found slope to be the most important factor affecting DEM accuracy. The smallest bias in the elevation values was detected on flat areas (slope <5 • ), and the bias increased with increasing slope, thereby increasing variation of the errors. Aspect did not systematically affect DEM performance, which agrees with previous studies. Densely forested areas caused height overestimation by all DEMs. However, MERIT was slightly less affected by the forested areas, mainly due to the vegetation removal procedure. Other DEMs could benefit from similar vegetation correction by employing the new Global Forest Canopy Height dataset. . SRTM data were not available for this study area because of its high latitude. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S4: Distribution of pixels by slope class, slope aspect, and land cover classes. Slope class values: 1, 0 • -5 • ; 2, 5 • -10 • ; 3, 10 • -15 • ; 4, 15 • -20 • ; 5, 20 • -25 • ; 6, 25 • -30 • ; 7, 30 • -35 • ; 8, >35 • . Land cover classes: 1, Closed forest; 2, Open forest; 3, Shrubs; 4, Herbaceous vegetation; 5, Cultivated area; 6, Urban/built up area; 7, Bare land/sparse vegetation; 8, Wetland; 9, Snow and ice; 10, Water, Supplementary S5a: Summary error statistics for the global digital elevation models for the Estonian study area by slope aspect. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S5b: Summary error statistics for the global digital elevation models for the Chinese study area by slope aspect. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S5c: Summary error statistics for the global digital elevation models for the New Zealand study area by slope aspect. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S5d: Summary error statistics for the global digital elevation models for the Norwegian study area by slope aspect. SRTM data were not available for this study area because of its high latitude. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S6a: Summary error statistics for the global digital elevation models for the Estonian study area by land cover class. Land cover classes: 1, Closed forest; 2, Open forest; 3, Shrubs; 4, Herbaceous vegetation; 5, Cultivated area; 6, Urban/built up area; 7, Bare land/sparse vegetation; 8, Wetland; 9, Snow and ice; 10, Water. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S6b: Summary error statistics for the global digital elevation models for the Chinese study area by land cover class. Land cover classes: 1, Closed forest; 2, Open forest; 3, Shrubs; 4, Herbaceous vegetation; 5, Cultivated area; 6, Urban/built up area; 7, Bare land/sparse vegetation; 8, Wetland; 9, Snow and ice; 10, Water. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S6c: Summary error statistics for the global digital elevation models for the New Zealand study area by land cover class. Land cover classes: 1, Closed forest; 2, Open forest; 3, Shrubs; 4, Herbaceous vegetation; 5, Cultivated area; 6, Urban/built up area; 7, Bare land/sparse vegetation; 8, Wetland; 9, Snow and ice; 10, Water. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error, Supplementary S6d: Summary error statistics for the global digital elevation models for the Norwegian study area by land cover class. SRTM data were not available for this study area because of its high latitude. Land cover classes: 1, Closed forest; 2, Open forest; 3, Shrubs; 4, Herbaceous vegetation; 5, Cultivated area; 6, Urban/built up area; 7, Bare land/sparse vegetation; 8, Wetland; 9, Snow and ice; 10, Water. Error terms: ME, mean error; STD, standard deviation; 25% and 75%, quartiles; MedE, median error; NMAD, normalized median absolute deviation; RMSE, root-mean-square error.

Conflicts of Interest:
The authors declare no conflict of interest.