1. Introduction
Digital elevation models (DEMs) and terrain variables derived from DEMs (e.g., slope and aspect) are the basic data of science and engineering technology research. They play a necessary role in geomorphology [
1], volcanology [
2], glacier mass balance [
3], flooding modeling [
4], climatic modeling [
5], etc. DEMs are now primarily generated using remote sensing techniques because of the benefits that fewer people can map a large spatial area at a lower cost [
6]. Remote sensing techniques include photogrammetry, Interferometric Synthetic Aperture Radar (InSAR) and Light Detection And Ranging (LiDAR) [
7].
Errors of DEM can propagate throughout the data processing in subsequent investigations they are used in and may adversely affect the accuracy of the results [
8]. Therefore, it is very important to understand the quality of DEM to be used for certain applications [
9]. At present, many agencies have released global DEM products, which can provide topographic information for researchers. These distributing organizations give the global accuracy of DEM products, which cannot represent the quality in the local area because the quality of DEM varies on a regional scale. Hence, the quality of DEMs should be quantitatively evaluated and used with caution in the local area [
10]. Shuttle Radar Topography Mission (SRTM) DEM [
11] and Advanced Space Borne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) [
12] are the most commonly used global DEMs. Many scientific studies have reported the quality of these two DEMs in local regions [
13,
14,
15,
16,
17]. TerraSAR-X add-on for Digital Elevation Measurement (TanDEM-X) DEM, released in September 2016, has attracted users’ attention due to its unprecedented geometric resolution, precision and accuracy as a global DEM product [
18,
19]. It resulted in three types of data with resolutions of 12, 30 and 90 m, respectively. Before the final TanDEM-X DEM was released, the German Aerospace Agency (DLR) released TanDEM-X intermediate DEM (IDEM). The biggest difference between IDEM and the final DEM is that IDEM was generated only based on the data obtained in the first year, which means that most areas are only covered once. There has been some research on the quality assessment of TanDEM-X IDEM and the final TanDEM-X DEM product. The research used RTK-GNSS ground measurement points [
20], ICESat data [
21], globally distributed GPS points [
22], local DEMs [
23], LiDAR DEM [
24] or high-quality DSM derived from small footprint airborne LiDAR [
25] as the reference data to assess the vertical accuracy of TanDEM-X IDEM [
20,
23] or TanDEM-X DEM [
21,
22,
24,
25] globally [
21,
22] or in local study area [
20,
23,
24,
25] with different altitudes, topography and vegetation [
26,
27,
28,
29,
30,
31,
32]. The researchers also compared the obtained accuracy with other DEMs such as SRTM and ASTER GDEM. To assess a DEM’s accuracy, the accuracy of the reference should be at least three times better than that of the DEM to be assessed [
14]. The references the above studies used could satisfy the accuracy requirements. When there was no very high accuracy reference data, Grohmann et al. [
33] used the 12-m resolution TanDEM-X DEM with higher accuracy among the DEMs as the reference to assess the differences’ performance of 30-m resolution TanDEM-X DEM, ASTER GDEM, SRTM and ALOS AW3D DEM on selected Brazilian sites. Although they could not get the exact accuracy of the assessed DEMs, they could still provide insights into these DEMs’ quality, through the comparison of differences’ performance between elevation models.
These existing studies on the quality assessment of the 12- and 30-m TanDEM-X DEM product are rare in local areas of China. Keys et al. [
28] used the result of catchment delineations obtained using 12-m TanDEM-X DEM as the reference to compare the results obtained using SRTM, ACE2 and ASTER GDEM in Nam Co, China. These existing studies also paid attention to the relationship between DEM’s quality and geographical features. In the analysis of the factors that affect the quality of DEM, researchers have carried out a large number of studies on the relationship between the quality and slope, aspect and land cover [
13,
14,
24,
34]. However, most of the studies analyze each factor independently, omitting the combined influence of the factors on the quality. Satge et al. [
14] assessed the quality of DEM under different land cover where the slope was within 0–2° and 10–15°. Zhang et al. [
34] found that the main factor affecting the quality of DEM was slope among slope, aspect and land cover. Besides, the ways in which DEMs have been created are also important to DEMs’ quality. TanDEM-X DEM, ASTER GDEM and SRTM were created from the fusion of many observations. Hu et al. [
35] evaluated the influence of stack numbers on the quality of ASTER GDEM. Gdulová et al. [
25] conducted a similar exploration of TanDEM-X DEM. TanDEM-X DEM and SRTM were generated from the InSAR technology, so the local incidence angle may impact the quality of the final DEMs as the geometry of the incidence angle strongly influences the quality of the reflected radar beam. Kolecka et al. [
15] assessed the accuracy of SRTM X-band DEM in relation to radar beam geometry.
There are two main methods for evaluating the accuracy of DEMs currently [
14]: comparing the assessed DEM (i.e., the DEM to be assessed) with higher accuracy points and DEMs, respectively. When taking a DEM as the reference, the assessed DEMs need to be registered with the reference DEM because there exist systematic offsets between the models in both horizontal and vertical directions due to the effect of sensor errors, the orientation bias of reference ellipsoids and topographic relief [
35]. Some disadvantages of the related studies on co-registration were that the registration accuracy depended on the search step distance because of indirect computation [
13,
23], only the existence of translations was considered [
13,
15,
23,
30,
35,
36] or the co-registration procedure relied on manual recognition [
16].
Aiming to the limitations of the existing studies on quality assessment of the DEMs, the objective of this article is to provide some additional insights on the quality assessment of TanDEM-X DEMs, ASTER GDEM and SRTM. We selected study areas in China because there is a necessity to test whether the assessed results are special in China. Due to the limit of the conditions, we could not obtain the ground truth data. High-resolution and high-accuracy data are almost classified in every country. It is very hard to obtain, especially in China. Thus, this study is not the traditional accuracy assessment of the DEMs, but the quality assessment. DEMs’ quality includes many aspects, not only the absolute accuracy but also differences’ performance in relation to various influencing factors, which is exactly what this article studies. The consistency between these DEMs and the reference data can indicate the quality. The higher is the accuracy of the reference data, the better, so that our results could provide more convincing guidance on DEMs’ quality for related application researchers. As a result, ICESat/GLAS points with 14-cm absolute vertical accuracy [
37] were selected as the reference points. It should be noted that one ICESat/GLAS point has a diameter of 70 m on the ground, which is larger than the resolution of most DEMs, and the center spacing of adjacent points on the same track is about 172 m [
37]. Moreover, the accuracy of ICESat/GLAS points in mountainous regions is not high [
25]. Due to the small number, non-uniform distribution and lack of describing the terrain details of ICESat/GLAS points in the wide range of the area [
22], taking a high-resolution and high-accuracy DEM with continuous and complete coverage as the reference to assess the quality of DEMs is also suitable. The 12-m resolution TanDEM-X DEM with less than 10 m absolute vertical accuracy [
21] is the best we could utilize at this time. More specific goals of this article are as follows: (1) We assess and compare the vertical quality of 12- and 30-m resolution TanDEM-X DEM, 30-m resolution ASTER GDEM and 30-m resolution SRTM DEM on four selected Chinese sites; analyze the quality in terms of slope, aspect and landcover; control the variable of the slope when analyzing the relationship between land cover and quality; analyze the quality in terms of the number of images used in the fusion process for generating the final DEMs; and analyze the quality in terms of local incidence angle of SRTM C-band DEM for the first time, trying to provide a new perspective for quality assessment of SRTM and other DEMs whose incidence angle files are available. (2) We use an improved Least Z-Difference (LZD) method [
38] which considers nine transformation parameters that may exist between two DEMs to accomplish the co-registration procedure. (3) We use a Quantile–Quantile (Q-Q) plot to identify if the DEM errors follow a normal distribution and choose proper statistical indicators accordingly.
3. Methods
There are two main methods for evaluating the accuracy of DEMs currently [
14]. One is comparing the DEM with higher accuracy points, such as the GPS data and elevation control points collected from a large-scale topographic map. The accuracy of the reference points should be at least three times better than that of the DEM to be assessed. Another is comparing the DEM with higher accuracy DEMs, such as the DEM data generated by large-scale topographic maps, LiDAR DEMs and other higher-accuracy DEMs. Due to the lack of ground truth data, we cannot assess the absolute vertical accuracy of the DEMs in this study. We aim to assess differences’ performance to explore the consistency between the assessed DEMs and the reference data. The consistency can also reflect the quality of DEMs under the circumstance that the reference data is better than the assesses DEMs. Moreover, we can still borrow the above ideas for evaluating the accuracy of DEMs to assess DEMs’ quality in this study.
ICESat/GLAS GLAH14 points with 14-cm absolute vertical accuracy but size of 70-m diameter were chosen as the reference points to assess and compare the vertical quality of TanDEM-X DEM with a pixel size of 12 and 30 m and ASTER GDEM, SRTM with a pixel size of 30 m. It is also should be noted that the vertical accuracy of ICESat/GLAS points in mountainous regions is not high. Due to the small number, non-uniform distribution and lack of describing the terrain details of ICESat/GLAS points in the wide range of the study area, taking a high-resolution and high-accuracy DEM with continuous and complete coverage as the reference to assess the quality of DEMs is also suitable. Thus, 12-m resolution TanDEM-X DEM with less than 10 m absolute vertical accuracy, which is the best we could utilize at this time, was chosen as the reference DEM of the other three DEMs with a pixel size of 30 m to perform a model-to-model comparison. Previous studies [
28,
33,
48] have also used the 12-m TanDEM-X DEM as the reference to test the height quality and the performance in applications of many independent datasets. Through the analysis of the height differences’ performance between the assessed DEMs and the reference, we could evaluate the quality of the DEMs.
Figure 2 describes the general workflow of this study.
3.1. Data Preprocessing
Since the horizontal and vertical datums of these elevation data are different (
Table 2), they need to be unified [
15]. The plane position difference of the WGS84 ellipsoid and the Topex/Poseidon ellipsoid is very small [
49,
50], so the difference in the vertical direction is mainly considered [
51]. The WGS84 ellipsoid, the native vertical datum of TanDEM-X DEM, was selected as the vertical reference datum in this study.
The vertical datum of ASTER GDEM and SRTM is the EGM96 geoid. The deviations of the EGM96 from the WGS84 ellipsoid provided by GAMMA software were used to convert the vertical datums of these two DEMs. The vertical datum of ICESat/GLAS points is the Topex/Poseidon ellipsoid. The MATLAB tool provided by NSIDC was used to read the ICESat/GLAS data, and the parameter d_deltaEllip was used to convert the vertical datum from the T/P ellipsoid to the vertical reference datum. Two quality control parameters provided by NSIDC were selected to remove low-quality ICESat/GLAS points. The first parameter elev_use_flg indicates whether the elevation value should be used or not, and only the data with a value of 0 were selected. The second parameter sat_corr_flg indicates whether saturation phenomenon occurred during the acquisition and if it was corrected, and only the non-saturated points were selected. Moreover, the ICESat/GLAS points whose absolute difference with other DEMs is greater than 50 m were removed. We kept the 50-m threshold to eliminate the gross errors of the ICESat/GLAS points so as to ensure a high-quality database [
52], especially considering that the accuracy of ICESat/GLAS points in mountainous regions is not high [
25]. Finally, all elevation data were projected to the UTM coordinate system using ArcGIS [
53].
3.2. Data Co-Registration
When performing the model-to-model comparison, the assessed DEMs need to be registered with the reference DEM. Although the datum unifying is implemented, there still exist systematic offsets between the assessed DEMs and the reference DEM in both horizontal and vertical directions due to the effect of sensor errors, the orientation bias of reference ellipsoids and topographic relief [
35]. An improved Least Z-Difference (LZD) method was applied to data co-registration in this study. Generally, the DEM co-registration model can be expressed by Equation (1) [
38]:
where
X,
Y and
Z are the three-dimensional coordinates of the reference DEM, respectively.
x,
y and
z are the three-dimensional coordinates of the assessed DEM, respectively.
,
and
are the translations in three-dimensional direction of the assessed DEM, respectively.
is the three-dimensional scale factor of the assessed DEM. R is the rotation matrix (Equation (2)).
where
r11 = cos
× cos
,
r21 = cos
× sin
+ sin
× sin
× cos
,
r31 = sin
× sin
− cos
× sin
× cos
,
r12 = −cos
× sin
,
r22 = cos
× cos
− sin
× sin
× sin
,
r32 = sin
× cos
+ cos
× sin
× sin
,
r13 = sin
,
r23 = −sin
× cos
and
r33 = cos
× cos
.
,
and
are three rotation angles of the assessed DEM around three axes, respectively.
The co-registration process of two DEMs includes two steps. The first step, which is the key procedure, is solving the conversion parameters , , , M, , and (Equation (1)). The second step is substituting the solved parameters into Equation (1) and then converting all points of the DEM to be registered to obtain a registered DEM.
The Least Z-Difference (LZD) algorithm proposed by Rosenholm and Torlegard [
38] is a surface matching algorithm aiming at DEM co-registration in the field of photogrammetry based on the least square method. The LZD algorithm established an observation equation (Equation (3)) for each grid cell on the DEM to be registered based on the sum of squares of height differences between it and the corresponding point on the reference DEM is minimum.
where λ is the height difference between the assessed DEM and the reference DEM,
= M − 1 and
and
are the elevation changes of the reference DEM in its
X and
Y directions per unit length, respectively.
For each grid cell on the DEM to be registered, there is an observation equation. The seven transformation parameters , , , M, , and can be solved using the least square method. Finally, we can use these seven parameters to transform the original coordinates of the DEM to be registered to complete the DEM registration.
However, in the co-registration process of two DEMs, the three-dimensional scale factor may be different. Therefore, we considered that the three axes might correspond to different scale factors (Equation (4)). This improved LZD method we proposed increases the number of transformation parameters from seven to nine.
where
,
and
are the three-dimensional scale factor minus 1, respectively.
The new observation equation is shown as Equation (5):
3.3. Quality Assessment of DEMs
The elevation information of DEM is its important application. Therefore, the quality analysis of DEM mainly focused on its vertical quality in this study. Residuals are the height differences between the assessed DEM and the reference data at the corresponding point, which can be used to measure the vertical quality of the DEM. Mean Error (ME) (Equation (6)), Root Mean Square Error (RMSE) (Equation (7)) and Standard Deviation (STD) (Equation (8)) are the most widely used indicators for describing the characteristics of errors. In the case, where the errors follow a normal distribution, these indicators perform well [
22,
54,
55]. However, the errors follow a non-normal distribution in most cases of DEMs [
54,
55]. In the case of non-normality, Median Error (MED) (Equation (9)), Median Absolute Deviation (MAD) (Equation (10)) and LE90 are more robust quality indicators [
54]. For normally distributed observations, the linear error at 90% confidence level is LE90 = STD × 1.65. As height differences tend to follow a non-normal distribution, here the LE90 is directly equated to the 90th percentile of the sorted absolute differences calculated by the minimum rank method, i.e., the smallest value in the list. In other words, 90% of the data are less than or equal to that value [
22,
54].
where
is the height difference between the assessed DEM and the reference data at the
ith point and
N is the total number of errors.
Outliers corrupt statistical accuracy measures, so the three-sigma rule was used to remove the errors satisfying Equation (11). All statistical measures presented in
Section 4 are outlier-removed.
To check whether the DEM errors in this study follow a normal distribution, a normality test should be conducted. The Quantile–Quantile (Q-Q) plot can identify whether a large number of data approximates a normal distribution. It is a scatter plot with the quantiles of the residuals on the vertical axis and the quantiles of the standard normal distribution value on the horizontal axis. If the actual distribution is normal, the Q-Q plot should yield a straight line. Statistical tests can also be used to investigate whether data originate from a normal distribution, but, in the case of large datasets or outliers, these tests are often rather sensitive. Therefore, we prefer the visual method of the Q-Q plot [
54].
Then, we analyzed the quality in terms of slope, aspect and land cover. We also focused on the ways in which these DEMs have been created. Since TanDEM-X DEM, ASTER GDEM and SRTM were created from the fusion of many observations, we evaluated the effect of the number of images used in the fusion process on their quality. For TanDEM-X DEM and SRTM, which were created from SAR images, the local incidence angle may impact the quality of the DEM as the geometry of the incidence angle strongly influences the quality of the reflected radar beam [
15]. Because we can easily get all the original local incidence angle files used to produce the final combined SRTM in each study area, we further evaluated the effect of the local incidence angle on SRTM’s quality.
To assess the vertical quality in terms of slope, we calculated the slope of every pixel in the study areas based on the TanDEM-X DEM with a pixel size of 12 m using ArcGIS:
where
and
are the elevation changes in
X and
Y directions per unit length, respectively. When using ICESat/GLAS as the reference data, we divided the slope into six levels referring to Wang et al. [
56]: 0–2° (flat), 2–7° (undulating), 7–15° (soft), 15–25° (gentle), 25–35° (steep) and more than 35° (very steep). The elevation differences were then summarized according to the binned slopes. MAD and LE90 were calculated within each slope level for assessing DEM’s quality. When using TanDEM-X DEM as the reference data, the slope bin was chosen as one degree because of the large number of differences in each bin. MAD and LE90 were also calculated.
To assess the vertical quality in terms of aspect, we also calculated the aspect of every pixel in the study areas based on the TanDEM-X DEM with a pixel size of 12 m using ArcGIS, and then the aspect was divided into eight directions: N (337.5–360° and 0–22.5°), NE (22.5–67.5°), E (67.5–112.5°), SE (112.5–157.5°), S (157.5–202.5°), SW (202.5–247.5°), W (247.5–292.5°) and NW (292.5–337.5°). MAD and LE90 were calculated within each direction for assessing DEM’s quality.
Vertical quality is also correlated to land cover. Some land cover types, such as forests, vegetation, waters, etc., will affect the elevation quality obtained by ranging and image matching. We used FROM-GLC10 land cover map with a 10-m finer resolution distributed by Gong et al. [
47]. It was then bilinearly interpolated to the position of reference data after UTM projection. Ten land-cover classes were defined in the original model, but, for simplicity of analysis, three broad classes were defined referring to Varga and Bašić [
13]: (1) cropland, grassland and bareland; (2) shrubland; and (3) forest. No permanent snow or ice was found in study areas. Due to their characteristics, snow, ice, water bodies and other land cover types were not considered [
14]. Because of the complex topography and land cover in China, the elevation error may be affected by a combination of many factors [
57]. To avoid the influence of slope, the quality of DEM under different land cover was assessed in each slope range.
When assessing the vertical quality in terms of the number of images used in the fusion process, we only took ICESat/GLAS as the reference data, avoiding the number of TanDEM-X DEM images from affecting the results of other DEMs. The number of images of each DEM was extracted at every reference point. MAD and LE90 were calculated within each quantity level.
When assessing SRTM’s quality in terms of the local incidence angle, we still only used ICESat/GLAS as the reference data. The local incidence angle was classified into three zones: (1) areas where the local incidence angle was less than the look angle, indicating a surface facing towards the radar; (2) areas where the local incidence angle was greater than the look angle, indicating a surface opposite to the radar; and (3) areas where the local incidence angle was equal to the look angle, indicating a flat surface (
Figure 3). There were several local incidence angle images at each point. The zones of the point on different images maybe not the same. To avoid the joint influence and get pure results, we only analyzed the points which were in the same zone on all local incidence angle images. MAD and LE90 were calculated for each zone.
5. Discussion
5.1. Overall Quality
We first discuss the results obtained with ICESat/GLAS points as the reference data. Since the accuracy of ICESat/GLAS points in mountainous regions is not high [
25], many points do not meet the data preprocessing requirements in Sichuan, causing a large number of points with low quality eliminated. Positive values of ME mean that the elevation of these DEMs is generally higher than that of the reference. Part of the reason is the influence of the vegetation canopy, where short-wavelength SAR or optical satellite sensors cannot penetrate [
58]. This overestimation was noted in previous studies [
32,
35,
59,
60].
TanDEM-X DEMs showed the best quality. The next-best DEM was SRTM. ASTER GDEM had the worst quality. This reflects the superiority of InSAR technology in obtaining elevation in vegetation coverage areas compared with optical technology. MAD can be used to estimate the standard deviation more resistant to outliers in the dataset. TanDEM-X DEM had the smallest MAD, meaning that it had fewer gross errors and was more robust [
36]. C-band penetrates vegetation better than X-band. Therefore, the elevation obtained by C-band SAR should be closer to the surface elevation than that obtained by X-band SAR. However, the results of this study are the opposite of the theory. Part of the reason is that the fusion of multi-baseline and multi-aspect TanDEM-X DEMs improves the quality of the final TanDEM-X DEM product, which was also discussed by Liu et al. [
58]. Because the acquisition time of the images used to generate the DEMs is different, we should also consider the effects of changes in vegetation and terrain over a period of time on experimental results.
In study areas with more undulating terrain, the overall quality of the 12-m TanDEM-X DEM was better than that of the 30-m TanDEM-X DEM. It means that downsampling will reduce the quality of DEM, especially in mountainous areas [
35]. It also demonstrates the advantages of high-resolution DEM in describing terrain details [
58]. In the other two study areas with the reference points mainly distributed in flat areas, the overall quality of the 12- and 30-m were very close. Hu et al. [
35] presented similar findings.
Next, we discuss the results obtained with 12-m TanDEM-X DEM as the reference data. The fact that the residuals between DEM models had non-normal distribution was also observed in many previous studies [
13,
20,
22]. The changes of the x-coordinate at the peak of the histogram and the frequency near the peak after co-registration reflect the reduction of systematic errors in the vertical direction [
22] and the increase in the quality of the geographic location, respectively. The decreases in the indicators’ values after co-registration confirmed the conclusions of previous studies [
13,
23] that systematic deviations could reduce the vertical quality of DEM. These also show that the co-registration method used in this study can be used to correct the systematic errors of these DEMs. That the co-registered 30-m TanDEM-X DEM had the best quality is not unexpected because 30 m data was produced from the 12 m data using the moving-window method.
5.2. Quality versus Slope
The slope is the main factor affecting the elevation measurements [
34]. The slope can cause geometric distortion in the processing of optical and SAR images, which will affect the extraction of true elevation. We first discuss the results obtained with ICESat/GLAS points as the reference data. The position where the peaks of indicators appeared indicates that the slope impacts image acquisition. As the slope increased, the quality of DEMs tended to decline, meaning the slope strongly influenced the quality of DEMs produced by InSAR or optical photogrammetry technology. The reasons for this phenomenon may be: (1) As the slope increases, the surface becomes rougher. It is challenging to measure ground elevation with optical photogrammetry and InSAR technology. (2) High-slope areas are mainly vegetation, hindering optical photogrammetry and short-wavelength SAR from measuring the ground beneath the tree canopy. (3) When the slope varies greatly, the elevation has obvious fluctuations. The value of a pixel with a size of 12 or 30 m cannot accurately represent the elevation of the land within the pixel, which can cause a large vertical error. In the flat regions, the elevation has small fluctuations, so the pixel value can basically represent the elevation of the land within the pixel. The quality of DEMs decreased with the increasing slope, which had been proven by many previous studies [
61,
62]. The 30-m DEMs increased more drastically with increasing slopes than the 12-m DEM, which meant that DEMs with higher resolution had a better tolerance for slope changes. When slopes were >25°, the indicators’ values of TanDEM-X DEM with a pixel size of 12 and 30 m were significantly different, which confirmed that higher-resolution DEM could better describe terrain slopes. Previous studies also made similar findings [
35]. The quality of ASTER GDEM had the most dramatic decline in areas with steep slopes among the four DEMs, meaning ASTER GDEM’s vertical quality was most affected by the slope.
The same DEM had different quality in four study areas within the same slope range. The reasons may be: (1) The main land cover types in Sichuan and Xinjiang B are vegetation, while Xinjiang A and Inner Mongolia are mainly bareland. Therefore, the effects of land cover may cause this situation. (2) The slope value of the pixel is not very accurate in Sichuan and Xinjiang B with large slope changes. The pixel value may underestimate the true slope, making the classification of the errors inaccurate.
Next, we discuss the results obtained with 12-m TanDEM-X DEM as the reference data. The phenomenon that the indicators’ values had apparent fluctuations in Xinjiang A and Inner Mongolia could be related to the too few (although > 10) pixels in certain high slope bins, causing the laws in these slope bins not representative enough. In Sichuan and Xinjiang B, where had adequate pixels at high slope bins, there were no such fluctuations. In the interval of 88–89° in Sichuan, the extremely high values indicated that the quality of DEMs was seriously affected at extremely steep places. Since the 30-m TanDEM-X DEM was generated by resampling the 12-m TanDEM-X DEM, these results also show that the quality of resampling results is affected by the slope.
5.3. Quality versus Aspect
The results that the quality of DEM varies with the aspect when taking ICESat/GLAS points as the reference are similar to the previous findings [
24]. The reason for the differentiation of the elevation quality with the aspect may be related to the heading of the satellite sensors in the ascending and the descending orbit [
63] and the incidence angle of the radar sensor to the surface [
51,
57].
The strong relationship between aspects and DEM’s quality is an indicator of data misregistration [
33,
64]. As a result, this rule reflected that the co-registration results between DEMs were good, but high slopes and dense vegetation could influence the co-registration effects.
5.4. Quality versus Land Cover
We first discuss the results obtained with ICESat/GLAS points as the reference data. The indicators’ values of the other three DEMs in the forest were higher than those in the cropland/grassland/bareland, except for ASTER GDEM. It shows that in places with denser vegetation coverage, it is more difficult for the InSAR band to penetrate vegetation, which makes it challenging to obtain the true surface elevation. As a result, the quality of the DEMs in the forest is lower than that in the cropland/grassland/bareland. The quality of ASTER GDEM in the forest was higher than that in the cropland/grassland/bareland on a flat surface. This may be caused by ASTER GDEM’s low sensitivity to land cover, which was also mentioned by previous studies [
14]. There is an interval of the acquisition time between the DEMs and the land cover map, leading to a slight change of the actual local land cover, which may also affect the results.
In Sichuan, the shapes of the polylines in the forest (
Figure 10a) were very close to the shapes in
Figure 7a. The reason should be that there are more reference points in the forest, which dominates the trend of all points changing with slopes in Sichuan. The reason the values in
Figure 7a were higher than the maximum value of the land cover types in
Figure 10a at very steep slopes is that, before calculating the indicators, we used the three-sigma rule to eliminate the outliers. For
Figure 7a, we removed the outliers in each slope range. For
Figure 10a, we removed the outliers in each slope range under every land cover, thus removing more outliers. Therefore, the quality at very steep slopes was improved. That the effect of different land covers on the quality of DEMs was not obvious in Inner Mongolia may be related to the few points in forest and shrubland. Only 10 s to 100 points were distributed in those two land covers under every slope range, which made the law not obvious.
Next, we discuss the results obtained with 12-m TanDEM-X DEM as the reference data. For different land cover types, the indicators’ values in very low or very high slopes were close, and the values in middle slopes were significantly different. This showed that, in flat and extremely steep places, the land cover had a small impact on vertical differences, while, in the middle slope places, the land cover had a great impact on DEM’s quality. It should be noted that the reference DEM, such as the three assessed DEMs, is also DSM, not DTM. In very steep places, the reason the quality of DEMs under different land cover was close should be that the slopes dominated the quality [
34], where the impacts of land cover on DEMs’ differences became little.
5.5. Quality versus Number of Images
A larger number of images could contribute to the increasing quality of the final DEM products. For TanDEM-X DEM and SRTM, the number of images indicates the number of valid height values from different InSAR DEM acquisitions available for being fused to generate the final DEM product. In general, TanDEM-X DEM has a larger number of images than SRTM, which may be one of the reasons that TanDEM-X DEM owns a higher quality. For TanDEM-X DEM, the maximum number of images in Sichuan and Xinjiang B is larger than that in the other two study areas. This may be because Sichuan and Xinjiang B have obvious fluctuations. The previous study has similar findings that the most problematic areas are those with only one valid acquisition of TanDEM-X DEM [
25]. For ASTER GDEM, the number of images represents the number of images stacked to derive the final elevation values. The previous study [
35] also found that the quality of ASTER GDEM got higher with increasing stack numbers.
5.6. Quality versus Local Incidence Angle
In quality versus aspect analysis, the aspects were simply divided into eight directions, starting from true north (
Section 4.1.3). By classifying the aspect into three zones according to the relationship between the local incidence angle and the look angle, the results obtained were clearer. The data opposite to the sensors have the lowest quality, which was found by the previous study on SRTM-X [
15] and DEMs generated from RADARSAT stereopairs [
65]. In Sichuan, the quality for slopes facing towards the radar was higher than that for the flat area, while this relationship was just the opposite in Xinjiang B. This may be partly because that the quality of ICESat/GLAS points in mountainous regions is not high [
25]. This method could provide a new perspective for the quality assessment of SRTM and other DEMs whose incidence angle files are available. This also reminds us that researchers who want to generate a high-quality DEM by fusion of InSAR-derived intermediate DEMs could set low weights for places opposite to the sensors.
5.7. The Reference Data
The 30-m data were downsampled from 12-m resolution DEM using the moving-window smoothing process. Thus, when taking 12-m TanDEM-X DEM as the reference, that the 30-m TanDEM-X DEM had the best quality is not unexpected. Although the results are consistent with our expectations, the achieved quality of 30-m TanDEM-X DEM is still useful. Because these results can reflect that the downsampling process of TanDEM-X DEM is influenced by slope and landcover, and the 30-m TanDEM-X DEM agree better with the reference DEM after co-registration. It should be noted that the rather low elevation differences and unrealistic quality of 30-m TanDEM-X DEM when taking 12-m TanDEM-X DEM as the reference is because they originate from the same data source.
It might be a little confusing why we used the reference ICESat/GLAS points to assess the quality of another reference 12-m TanDEM-X DEM. This does not conflict with our objective that we want to provide some additional insights on the differences’ performance in relation to various influencing factors. Through the consistency performance between 12-m TanDEM-X DEM and ICESat points, we can observe the influence of various factors on the differences between these two data, which is exactly what this article studies. It is also helpful for us to better understand the quality obtained with 12-m TanDEM-X DEM as the reference.
5.8. Data Co-Registration
In this study, we used two kinds of reference data for DEMs’ quality assessment and comparison. In model-to-model comparison, we did co-registration between models using an improved LZD method to eliminate systematic errors. However, we did not perform co-registration when taking ICESat/GLAS points as reference data. This is because: (1) The diameter of the ICESat/GLAS spot can reach 70 m, and the distance between adjacent spots is about 172 m. By contrast, the possible systematic offsets (see
Table 4) are minimal. As a result, co-registration between ICESat/GLAS points and DEMs with 12- and 30-m pixel size is not necessary. (2) The reference points we used are only ICESat/GLAS points, with no other GNSS points, causing the small number and small distribution area of reference points, which can reduce the effect of co-registration. In contrast, the reference DEM is a continuous surface, which is very suitable for co-registration.
It should be noted that the co-registration solves the systematic errors (not only planimetric systematic errors but also vertical systematic errors), so the quality of the evaluated product is not the one that the user of the product will really have. Although the quality analysis of DEMs mainly focused on their vertical quality in this study, we still need to pay attention to the systematic errors. The statistics of nine transformation parameters used in co-registration between the assessed DEMs and the reference DEM (
Table 4) show the systematic errors existed in 3D direction, which the researchers need to notice to make a real evaluation of each DEM.
5.9. Limitations of Our Study
The source data of the DEMs and the land cover map were obtained at different times, so the vegetation growth and topographic changes during this period would also affect the quality assessment [
35]. To minimize the potential impact changing with time, we visually inspected each obvious land cover change in Google Earth, referring to Hawker et al. [
24], and confirmed that no major changes occurred. However, small changes in local places could affect the experimental results. Future research should pay attention to this situation.
When exploring the relationship between land cover and DEM’s quality, we assessed the quality under different land cover in each slope range to reduce the impact of slope on results. Future studies could also pay attention to the conjoint analysis of slope/aspect.
6. Conclusions
There are many types of global DEMs distributed by different agencies, and the elevations of different DEMs are not the same. DEM’s quality affects the results of subsequent applications. SRTM and ASTER GDEM are the most commonly used global DEMs, and TanDEM-X DEM has attracted users’ attention due to its unprecedented accuracy. TanDEM-X DEM and SRTM are InSAR products, and ASTER GDEM is a photogrammetric product. To provide some additional insights on quality assessment of 12- and 30-m resolution TanDEM-X DEMs, 30-m resolution ASTER GDEM and 30-m resolution SRTM, this study assessed differences’ performance in relation to geographical features and the ways in which DEMs have been created on selected Chinese sites, taking ICESat/GLAS points with 14-cm absolute vertical accuracy but size of 70-m diameter and 12-m resolution TanDEM-X DEM with less than 10-m absolute vertical accuracy as the reference data.
Differences’ performance shows good consistency under the assessment using two reference data. Results show that: (1) TanDEM-X DEMs have the best overall quality, followed by SRTM. ASTER GDEM has the worst quality. InSAR technology has superiority in obtaining elevation in vegetation coverage areas compared with optical technology. The fusion of multi-baseline and multi-aspect TanDEM-X DEMs improves the quality of the final TanDEM-X DEM product. The 12-m TanDEM-X DEM has significant advantages in describing terrain details. Downsampling will reduce the quality of DEM, especially in mountainous areas. (2) The slope can cause geometric distortion in the processing of optical and SAR images, which can affect the extraction of true elevation. The quality of DEMs decreases with increasing slopes. ASTER GDEM’s vertical quality is most affected by the slope. SRTM is the second most sensitive to slopes. The 12-m TanDEM-X DEM has a better tolerance for slope changes. (3) TanDEM-X DEM has a larger value in NW and a smaller value in SW. In places where slopes are generally small, the changes of ASTER GDEM and SRTM with aspects are not obvious. In places of large fluctuations, the quality of ASTER GDEM and SRTM are the lowest near N and the highest near S. After the co-registration, the assessed 30-m resolution DEMs have close quality in eight aspects, especially in areas where slopes are generally small. High slopes and dense vegetation could influence co-registration effects. (4) For TanDEM-X DEM and SRTM, the denser is the vegetation cover, the lower is the quality of DEMs because, in places with denser vegetation coverage, it is more difficult for the InSAR band to penetrate vegetation, which makes it challenging to obtain the true surface elevation. ASTER GDEM has low sensitivity to land cover. If there are more reference points in a certain land cover, the shapes of the polylines describing the change of quality with slope under that land cover are consistent with the shapes of the polylines describing the change of quality with slope under all land cover types. (5) The DEMs were created from the fusion of many observations. The quality of DEMs gets higher with the increasing number of images used in the fusion process. (6) For SRTM, which is created from SAR images, the local incidence angle can impact the quality of the DEM as the geometry of the incidence angle strongly influences the quality of the reflected radar beam. The quality in places where the slopes are opposite to the radar beam is the worst. This method could provide a new perspective for the quality assessment of SRTM and other DEMs whose incidence angle files are available. Researchers who want to generate a high-quality DEM by fusion of InSAR-derived intermediate DEMs could set low weights for places opposite to the sensors. (7) The residuals between DEM models had non-normal distribution even after co-registration. The quality of the assessed DEMs increased after co-registration. Each transformation parameter using for co-registration is nearly in the same order of magnitudes in different study areas for different assessed DEMs.
In this study, our results concluded from study areas in China are similar to the results from other areas according to previous studies. Researchers who want to know the quality of a DEM in order to use it in further applications should pay more attention to the terrain factors and land cover in their study areas and the ways in which the DEM has been created. It should be noted that the statistics in the study are influenced by tree heights, especially in hilly and mountainous places. In model-to-model comparison, the quality results of the assessed three 30-m resolution DEMs are restricted by the quality of the reference 12-m resolution TanDEM-X DEM, which is the best we could utilize at this time. Further evaluation is necessary once finer and better reference DEM becomes available. The improved LZD method applied to co-registration is one of our highlights. However, this paper does not focus on the comparison of the co-registration results between our method and other methods for these global DEMs, which is also lacking in the current research. We only describe the reasons and advantages of using the LZD method from a theoretical level. Finally, with more global DEMs distributed, increasing the types of assessed DEMs is necessary for future studies.