Quality Assessment of TanDEM-X DEMs, SRTM and ASTER GDEM on Selected Chinese Sites

: Digital elevation models (DEMs) are the basic data of science and engineering technology research. SRTM and ASTER GDEM are currently widely used global DEMs, and TanDEM-X DEM, released in 2016, has attracted users’ attention due to its unprecedented accuracy. These global datasets are often used for local applications and the quality of DEMs affects the results of applications. Many researchers have assessed and compared the quality of global DEMs on a local scale. To provide some additional insights on quality assessment of 12- and 30-m resolution TanDEM-X DEMs, 30-m resolution ASTER GDEM and 30-m resolution SRTM, this study assessed differences’ performance in relation to not only geographical features but also the ways in which DEMs have been created on selected Chinese sites, taking ICESat/GLAS points with 14-cm absolute vertical accuracy but size of 70-m diameter and 12-m resolution TanDEM-X DEM with less than 10-m absolute vertical accuracy as the reference data for comprehensive quality evaluation. When comparing the three 30-m DEMs with the reference DEM, an improved Least Z-Difference (LZD) method was applied for co-registration between models, and Quantile–Quantile (Q-Q) plot was used to identify if the DEM errors follow a normal distribution to help choose proper statistical indicators accordingly. The results show that: (1) TanDEM-X DEMs have the best overall quality, followed by SRTM. ASTER GDEM has the worst quality. The 12-m TanDEM-X DEM has signiﬁcant advantages in describing terrain details. (2) The quality of DEM has a strong relationship with slope, aspect and land cover. However, the relationship between aspect and vertical quality weakens after data co-registration. The quality of DEMs gets higher with the increasing number of images used in the fusion process. The quality in where slopes opposite to the radar beam is the worst for SRTM, which could provide a new perspective for quality assessment of SRTM and other DEMs whose incidence angle ﬁles are available. (3) Systematic deviations can reduce the vertical quality of DEM. The differences have non-normal distribution even after co-registration. For researchers who want to know the quality of a DEM in order to use it in further applications, they should pay more attention to the terrain factors and land cover in their study areas and the ways in which the DEM has been created.


Introduction
Digital elevation models (DEMs) and terrain variables derived from DEMs (e.g., slope and aspect) are the basic data of science and engineering technology research. They play a necessary role in geomorphology [1], volcanology [2], glacier mass balance [3], flooding modeling [4], climatic modeling [5], etc. DEMs are now primarily generated using remote sensing techniques because of the benefits that fewer people can map a large spatial area at a lower cost [6]. Remote sensing techniques include photogrammetry, Interferometric Synthetic Aperture Radar (InSAR) and Light Detection And Ranging (LiDAR) [7].
Errors of DEM can propagate throughout the data processing in subsequent investigations they are used in and may adversely affect the accuracy of the results [8]. Therefore, it There are two main methods for evaluating the accuracy of DEMs currently [14]: comparing the assessed DEM (i.e., the DEM to be assessed) with higher accuracy points and DEMs, respectively. When taking a DEM as the reference, the assessed DEMs need to be registered with the reference DEM because there exist systematic offsets between the models in both horizontal and vertical directions due to the effect of sensor errors, the orientation bias of reference ellipsoids and topographic relief [35]. Some disadvantages of the related studies on co-registration were that the registration accuracy depended on the search step distance because of indirect computation [13,23], only the existence of translations was considered [13,15,23,30,35,36] or the co-registration procedure relied on manual recognition [16].
Aiming to the limitations of the existing studies on quality assessment of the DEMs, the objective of this article is to provide some additional insights on the quality assessment of TanDEM-X DEMs, ASTER GDEM and SRTM. We selected study areas in China because there is a necessity to test whether the assessed results are special in China. Due to the limit of the conditions, we could not obtain the ground truth data. High-resolution and highaccuracy data are almost classified in every country. It is very hard to obtain, especially in China. Thus, this study is not the traditional accuracy assessment of the DEMs, but the quality assessment. DEMs' quality includes many aspects, not only the absolute accuracy but also differences' performance in relation to various influencing factors, which is exactly what this article studies. The consistency between these DEMs and the reference data can indicate the quality. The higher is the accuracy of the reference data, the better, so that our results could provide more convincing guidance on DEMs' quality for related application researchers. As a result, ICESat/GLAS points with 14-cm absolute vertical accuracy [37] were selected as the reference points. It should be noted that one ICESat/GLAS point has a diameter of 70 m on the ground, which is larger than the resolution of most DEMs, and the center spacing of adjacent points on the same track is about 172 m [37]. Moreover, the accuracy of ICESat/GLAS points in mountainous regions is not high [25]. Due to the small number, non-uniform distribution and lack of describing the terrain details of ICESat/GLAS points in the wide range of the area [22], taking a high-resolution and high-accuracy DEM with continuous and complete coverage as the reference to assess the quality of DEMs is also suitable. The 12-m resolution TanDEM-X DEM with less than 10 m absolute vertical accuracy [21] is the best we could utilize at this time. More specific goals of this article are as follows: (1) We assess and compare the vertical quality of 12and 30-m resolution TanDEM-X DEM, 30-m resolution ASTER GDEM and 30-m resolution SRTM DEM on four selected Chinese sites; analyze the quality in terms of slope, aspect and landcover; control the variable of the slope when analyzing the relationship between land cover and quality; analyze the quality in terms of the number of images used in the fusion process for generating the final DEMs; and analyze the quality in terms of local incidence angle of SRTM C-band DEM for the first time, trying to provide a new perspective for quality assessment of SRTM and other DEMs whose incidence angle files are available. (2) We use an improved Least Z-Difference (LZD) method [38] which considers nine transformation parameters that may exist between two DEMs to accomplish the co-registration procedure. (3) We use a Quantile-Quantile (Q-Q) plot to identify if the DEM errors follow a normal distribution and choose proper statistical indicators accordingly.

Study Area
Four areas, each of which is a 1 • × 1 • tile, were chosen as study areas (Figure 1) in China based on two main criteria [32]: (1) the availability of the 12-and 30-m TanDEM-X DEM, as they are not openly available; and (2) the diversity of the terrain slope and the land cover, as they are the main factors known to influence DEMs' quality. Thus, we chose these four study areas because the TanDEM-X DEMs were available and there were different topographic conditions (flat, hilly and mountainous areas) coexisting with various land cover types, making these areas suitable for a comprehensive DEM quality evaluation (Table 1). We named the study areas after the province where they are located. The study areas of Xinjiang A and Xinjiang B are adjacent. Each study area contains a small or most mountainous region. Most areas of Sichuan are high mountains. About 56% of Sichuan is covered by forest and about 40% is covered by cropland/grassland/bareland. There are large mountainous areas in the northeast of Xinjiang B. About 16% of Xinjiang B is covered by forest and about 74% is covered by cropland/grassland/bareland. There are small mountainous areas in the northeast of Xinjiang A. About 91% of Xinjiang A is covered by cropland/grassland/bareland. There are small and long mountainous areas in the north of Inner Mongolia. About 86% of Inner Mongolia is covered by cropland/grassland/bareland, about 3% is covered by forest and about 2% is covered by shrubland.
Remote Sens. 2021, 13,1304 4 of 32 land cover types, making these areas suitable for a comprehensive DEM quality evaluation (Table 1). We named the study areas after the province where they are located. The study areas of Xinjiang A and Xinjiang B are adjacent. Each study area contains a small or most mountainous region. Most areas of Sichuan are high mountains. About 56% of Sichuan is covered by forest and about 40% is covered by cropland/grassland/bareland. There are large mountainous areas in the northeast of Xinjiang B. About 16% of Xinjiang B is covered by forest and about 74% is covered by cropland/grassland/bareland. There are small mountainous areas in the northeast of Xinjiang A. About 91% of Xinjiang A is covered by cropland/grassland/bareland. There are small and long mountainous areas in the north of Inner Mongolia. About 86% of Inner Mongolia is covered by cropland/grassland/bareland, about 3% is covered by forest and about 2% is covered by shrubland.
Sichuan Xinjiang A Xinjiang B Inner Mongolia    Table 2 shows the characteristics of the elevation data used in this study. TerraSAR-X add-on for Digital Elevation Measurements (TanDEM-X) mission is a constellation of two satellites with almost identical design parameters, flying in a close orbit formation with a baseline of 250-500 m and acquiring radar images of the Earth surface at high spatial resolution [21,[39][40][41][42]. The 12-m resolution TanDEM-X DEM, the main product of the mission, is expected to go beyond the quality of global DEMs available so far with an unprecedented geometric resolution, precision and accuracy [18,19]. The configuration of the sensors and orbit allowed across-track and along-track interferometry. The mission accomplished complete imaging of Earth at least twice, with additional coverage in areas of complex topography, including the repositioning of the orbits to avoid radar shadowing in mountainous terrains [19,33,43,44]. Thus far, the data acquired during the mission between December 2010 and January 2015 resulted in three global datasets [25]: (1) TanDEM-X DEM produced by the DLR at a 0.4 (~12 m) and 1 arcsec (~30 m) resolution released in September 2016, which are available for scientific use (the 30-m resolution DEM was resampled from 12-m resolution DEM using the moving-window smoothing process); (2) WorldDEM from Airbus Defense and Space for commercial use at a 12-m resolution; and (3) freely available TanDEM-X DEM at 90-m resolution. We used the 12-and 30-m resolution TanDEM-X DEM provided by the DLR.

Elevation Data
The Shuttle Radar Topography Mission (SRTM) was a cooperation among NASA, the U.S. National Geospatial-Intelligence Agency (NGA), U.S. Department of Defense (DoD), DLR and the Agenzia Spaziale Italiana (ASI, Italy) [33]. The shuttle carrying two SAR antennas with a baseline of 60 m can acquire radar images of the Earth surface simultaneously. Farr et al. [11] gave a detailed review of the SRTM mission. The data were acquired during 11 days in February 2000. The C band system (5.6 cm, SIR-C) and X band system (3.1 cm, X-SAR) operated during the mission, with the C radar generating a continuous mapping coverage and the X radar generating data along discrete swaths 50 km wide [11]. The SRTM data used in this study are from the C band system. In 2015, the 1 arcsec resolution (~30 m) SRTM V3 data in China was publicly released, which was Remote Sens. 2021, 13, 1304 6 of 32 used in this study. It was downloaded from the website (http://earthexplorer.usgs.gov/ (accessed on 2 January 2015)).
The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) sensor was launched in December 1999 onboard the Terra satellite. It can generate alongtrack stereoscopic images on the Near Infra-Red wavelength (0.78-0.86 µm) [45]. In 2019, the 1 arcsec resolution (~30 m) ASTER GDEM V3 data were publicly released by the NASA Jet Propulsion Lab (JPL) and Japan's Ministry of Economy, Trade, and Industry (METI), which was used in this study. It was downloaded from the website (https://search. earthdata.nasa.gov/ (accessed on 5 August 2019)).
It should be noted that the DEMs used in this study refer to Digital Surface Models (DSM). This is due to the weak penetration of TanDEM-X DEM and SRTM radar signals in dense vegetation and the sensitivity to the canopy of optical images used for generating ASTER GDEM.
In January 2003, Ice, Cloud, and land Elevation Satellite (ICESat) was launched by NASA. The Geo-science Laser Altimetry System (GLAS) carried on the satellite is mainly used to measure the height of the ice cover and its changes over time, the shape of clouds and aerosols, the height of land and the thickness of vegetation and the thickness of sea ice [46]. GLAS adopts the nadir direction observation mode. The laser pulse can form a spot with a diameter of 70 m on the ground, and the center spacing of adjacent laser spots on the same track is about 172 m [37]. At present, the standard ICESat/GLAS data products released by the National Snow and Ice Data Center (NSIDC) have 15 categories. GLAH14 is the altimetry data of the land surface reproduced from GLAH05 and GLAH06. Compared with the current DEMs, GLAH14 has higher accuracy. Thus, it is often used as the reference data for quality assessment of DEMs. This ICESat/GLAS GLAH14 was used in this study. It was downloaded from the website (https://nsidc.org/data/GLAH1 4/versions/34 (accessed on 1 August 2017).

Land Cover
Gong et al. applied the training sample set collected in 2015 to Sentinel-2 images acquired in 2017 to produce a 10-m resolution global land cover map, FROM-GLC10, with random forest classifier [47]. The dataset has 10 classes, namely cropland, forest, grassland, shrubland, wetland, water, tundra, impervious surface, bareland and snow/ice, with an overall accuracy of 72.76%. This dataset was used as ancillary data to assist our assessment. It was downloaded from the website (http://data.ess.tsinghua.edu.cn/fromglc10_2017v01. html (accessed on 8 March 2019).

Methods
There are two main methods for evaluating the accuracy of DEMs currently [14]. One is comparing the DEM with higher accuracy points, such as the GPS data and elevation control points collected from a large-scale topographic map. The accuracy of the reference points should be at least three times better than that of the DEM to be assessed. Another is comparing the DEM with higher accuracy DEMs, such as the DEM data generated by large-scale topographic maps, LiDAR DEMs and other higher-accuracy DEMs. Due to the lack of ground truth data, we cannot assess the absolute vertical accuracy of the DEMs in this study. We aim to assess differences' performance to explore the consistency between the assessed DEMs and the reference data. The consistency can also reflect the quality of DEMs under the circumstance that the reference data is better than the assesses DEMs. Moreover, we can still borrow the above ideas for evaluating the accuracy of DEMs to assess DEMs' quality in this study.
ICESat/GLAS GLAH14 points with 14-cm absolute vertical accuracy but size of 70-m diameter were chosen as the reference points to assess and compare the vertical quality of TanDEM-X DEM with a pixel size of 12 and 30 m and ASTER GDEM, SRTM with a pixel size of 30 m. It is also should be noted that the vertical accuracy of ICESat/GLAS points in mountainous regions is not high. Due to the small number, non-uniform distribution Remote Sens. 2021, 13, 1304 7 of 32 and lack of describing the terrain details of ICESat/GLAS points in the wide range of the study area, taking a high-resolution and high-accuracy DEM with continuous and complete coverage as the reference to assess the quality of DEMs is also suitable. Thus, 12-m resolution TanDEM-X DEM with less than 10 m absolute vertical accuracy, which is the best we could utilize at this time, was chosen as the reference DEM of the other three DEMs with a pixel size of 30 m to perform a model-to-model comparison. Previous studies [28,33,48] have also used the 12-m TanDEM-X DEM as the reference to test the height quality and the performance in applications of many independent datasets. Through the analysis of the height differences' performance between the assessed DEMs and the reference, we could evaluate the quality of the DEMs. Figure 2 describes the general workflow of this study.

Data Preprocessing
Since the horizontal and vertical datums of these elevation data are different (Table 2), they need to be unified [15]. The plane position difference of the WGS84 ellipsoid and the Topex/Poseidon ellipsoid is very small [49,50], so the difference in the vertical direction is mainly considered [51]. The WGS84 ellipsoid, the native vertical datum of TanDEM-X DEM, was selected as the vertical reference datum in this study.
The vertical datum of ASTER GDEM and SRTM is the EGM96 geoid. The deviations of the EGM96 from the WGS84 ellipsoid provided by GAMMA software were used to convert the vertical datums of these two DEMs. The vertical datum of ICESat/GLAS points is the Topex/Poseidon ellipsoid. The MATLAB tool provided by NSIDC was used to read the ICESat/GLAS data, and the parameter d_deltaEllip was used to convert the vertical datum from the T/P ellipsoid to the vertical reference datum. Two quality control parameters provided by NSIDC were selected to remove low-quality ICESat/GLAS points. The first parameter elev_use_flg indicates whether the elevation value should be used or not, and only the data with a value of 0 were selected. The second parameter sat_corr_flg indicates whether saturation phenomenon occurred during the acquisition and if it was corrected, and only the non-saturated points were selected. Moreover, the ICESat/GLAS points whose absolute difference with other DEMs is greater than 50 m were removed. We kept the 50-m threshold to eliminate the gross errors of the ICESat/GLAS points so as to ensure a high-quality database [52], especially considering that the accuracy of ICESat/GLAS points in mountainous regions is not high [25]. Finally, all elevation data were projected to the UTM coordinate system using ArcGIS [53].

Data Co-Registration
When performing the model-to-model comparison, the assessed DEMs need to be registered with the reference DEM. Although the datum unifying is implemented, there still exist systematic offsets between the assessed DEMs and the reference DEM in both horizontal and vertical directions due to the effect of sensor errors, the orientation bias of reference ellipsoids and topographic relief [35]. An improved Least Z-Difference (LZD) method was applied to data co-registration in this study. Generally, the DEM co-registration model can be expressed by Equation (1) [38]: where X, Y and Z are the three-dimensional coordinates of the reference DEM, respectively.
x, y and z are the three-dimensional coordinates of the assessed DEM, respectively. T X , T Y and T Z are the translations in three-dimensional direction of the assessed DEM, respectively. M is the three-dimensional scale factor of the assessed DEM. R is the rotation matrix (Equation (2)).
= sinθ x × cosθ z + cosθ x × sinθ y × sinθ z , r13 = sinθ y , r23 = −sinθ x × cosθ y and r33 = cosθ x × cosθ y . θ x , θ y and θ z are three rotation angles of the assessed DEM around three axes, respectively. The co-registration process of two DEMs includes two steps. The first step, which is the key procedure, is solving the conversion parameters T X , T Y , T Z , M, θ x , θ y and θ z (Equation (1)). The second step is substituting the solved parameters into Equation (1) and then converting all points of the DEM to be registered to obtain a registered DEM.
The Least Z-Difference (LZD) algorithm proposed by Rosenholm and Torlegard [38] is a surface matching algorithm aiming at DEM co-registration in the field of photogrammetry based on the least square method. The LZD algorithm established an observation equation (Equation (3)) for each grid cell on the DEM to be registered based on the sum of squares of height differences between it and the corresponding point on the reference DEM is minimum.
where λ is the height difference between the assessed DEM and the reference DEM, m = M − 1 and d f dX and d f dY are the elevation changes of the reference DEM in its X and Y directions per unit length, respectively.
For each grid cell on the DEM to be registered, there is an observation equation. The seven transformation parameters T X , T Y , T Z , M, θ x , θ y and θ z can be solved using the least square method. Finally, we can use these seven parameters to transform the original coordinates of the DEM to be registered to complete the DEM registration.
However, in the co-registration process of two DEMs, the three-dimensional scale factor may be different. Therefore, we considered that the three axes might correspond to Remote Sens. 2021, 13, 1304 9 of 32 different scale factors (Equation (4)). This improved LZD method we proposed increases the number of transformation parameters from seven to nine.
where mx, my and mz are the three-dimensional scale factor minus 1, respectively. The new observation equation is shown as Equation (5):

Quality Assessment of DEMs
The elevation information of DEM is its important application. Therefore, the quality analysis of DEM mainly focused on its vertical quality in this study. Residuals are the height differences between the assessed DEM and the reference data at the corresponding point, which can be used to measure the vertical quality of the DEM. Mean Error (ME) (Equation (6)), Root Mean Square Error (RMSE) (Equation (7)) and Standard Deviation (STD) (Equation (8)) are the most widely used indicators for describing the characteristics of errors. In the case, where the errors follow a normal distribution, these indicators perform well [22,54,55]. However, the errors follow a non-normal distribution in most cases of DEMs [54,55]. In the case of non-normality, Median Error (MED) (Equation (9)), Median Absolute Deviation (MAD) (Equation (10)) and LE90 are more robust quality indicators [54]. For normally distributed observations, the linear error at 90% confidence level is LE90 = STD × 1.65. As height differences tend to follow a non-normal distribution, here the LE90 is directly equated to the 90th percentile of the sorted absolute differences calculated by the minimum rank method, i.e., the smallest value in the list. In other words, 90% of the data are less than or equal to that value [22,54].
where H i di f f is the height difference between the assessed DEM and the reference data at the ith point and N is the total number of errors.
Outliers corrupt statistical accuracy measures, so the three-sigma rule was used to remove the errors satisfying Equation (11). All statistical measures presented in Section 4 are outlier-removed.
To check whether the DEM errors in this study follow a normal distribution, a normality test should be conducted. The Quantile-Quantile (Q-Q) plot can identify whether a large number of data approximates a normal distribution. It is a scatter plot with the quantiles of the residuals on the vertical axis and the quantiles of the standard normal distribution value on the horizontal axis. If the actual distribution is normal, the Q-Q plot should yield a straight line. Statistical tests can also be used to investigate whether data originate from a normal distribution, but, in the case of large datasets or outliers, these tests are often rather sensitive. Therefore, we prefer the visual method of the Q-Q plot [54].
Then, we analyzed the quality in terms of slope, aspect and land cover. We also focused on the ways in which these DEMs have been created. Since TanDEM-X DEM, ASTER GDEM and SRTM were created from the fusion of many observations, we evaluated the effect of the number of images used in the fusion process on their quality. For TanDEM-X DEM and SRTM, which were created from SAR images, the local incidence angle may impact the quality of the DEM as the geometry of the incidence angle strongly influences the quality of the reflected radar beam [15]. Because we can easily get all the original local incidence angle files used to produce the final combined SRTM in each study area, we further evaluated the effect of the local incidence angle on SRTM's quality.
To assess the vertical quality in terms of slope, we calculated the slope of every pixel in the study areas based on the TanDEM-X DEM with a pixel size of 12 m using ArcGIS: Slope degrees = Slope radians × 57.29578, where dz dx and dz dy are the elevation changes in X and Y directions per unit length, respectively. When using ICESat/GLAS as the reference data, we divided the slope into six levels referring to Wang et al. [56]: 0-2 • (flat), 2-7 • (undulating), 7-15 • (soft), 15-25 • (gentle), 25-35 • (steep) and more than 35 • (very steep). The elevation differences were then summarized according to the binned slopes. MAD and LE90 were calculated within each slope level for assessing DEM's quality. When using TanDEM-X DEM as the reference data, the slope bin was chosen as one degree because of the large number of differences in each bin. MAD and LE90 were also calculated.
To assess the vertical quality in terms of aspect, we also calculated the aspect of every pixel in the study areas based on the TanDEM-X DEM with a pixel size of 12 m using ArcGIS, and then the aspect was divided into eight directions: N (337. 5 Vertical quality is also correlated to land cover. Some land cover types, such as forests, vegetation, waters, etc., will affect the elevation quality obtained by ranging and image matching. We used FROM-GLC10 land cover map with a 10-m finer resolution distributed by Gong et al. [47]. It was then bilinearly interpolated to the position of reference data after UTM projection. Ten land-cover classes were defined in the original model, but, for simplicity of analysis, three broad classes were defined referring to Varga and Bašić [13]: (1) cropland, grassland and bareland; (2) shrubland; and (3) forest. No permanent snow or ice was found in study areas. Due to their characteristics, snow, ice, water bodies and other land cover types were not considered [14]. Because of the complex topography and land cover in China, the elevation error may be affected by a combination of many factors [57]. To avoid the influence of slope, the quality of DEM under different land cover was assessed in each slope range.
When assessing the vertical quality in terms of the number of images used in the fusion process, we only took ICESat/GLAS as the reference data, avoiding the number of TanDEM-X DEM images from affecting the results of other DEMs. The number of images of each DEM was extracted at every reference point. MAD and LE90 were calculated within each quantity level.
When assessing SRTM's quality in terms of the local incidence angle, we still only used ICESat/GLAS as the reference data. The local incidence angle was classified into three zones: (1) areas where the local incidence angle was less than the look angle, indicating a surface facing towards the radar; (2) areas where the local incidence angle was greater than the look angle, indicating a surface opposite to the radar; and (3) areas where the local incidence angle was equal to the look angle, indicating a flat surface ( Figure 3). There were several local incidence angle images at each point. The zones of the point on different images maybe not the same. To avoid the joint influence and get pure results, we only analyzed the points which were in the same zone on all local incidence angle images. MAD and LE90 were calculated for each zone.
Remote Sens. 2021, 13, 1304 11 of 32 three zones: (1) areas where the local incidence angle was less than the look angle, indicating a surface facing towards the radar; (2) areas where the local incidence angle was greater than the look angle, indicating a surface opposite to the radar; and (3) areas where the local incidence angle was equal to the look angle, indicating a flat surface ( Figure 3). There were several local incidence angle images at each point. The zones of the point on different images maybe not the same. To avoid the joint influence and get pure results, we only analyzed the points which were in the same zone on all local incidence angle images. MAD and LE90 were calculated for each zone.  Figure 4 shows the spatial distribution of elevation and ICESat/GLAS points in four study areas. Figure 5 shows the quality indicators of the four assessed DEMs using box plots. Statistics of the indicators are in Table S1.   Figure 4 shows the spatial distribution of elevation and ICESat/GLAS points in four study areas. Figure 5 shows the quality indicators of the four assessed DEMs using box plots. Statistics of the indicators are in Table S1.

Overall Quality
Remote Sens. 2021, 13, 1304 11 of 32 three zones: (1) areas where the local incidence angle was less than the look angle, indicating a surface facing towards the radar; (2) areas where the local incidence angle was greater than the look angle, indicating a surface opposite to the radar; and (3) areas where the local incidence angle was equal to the look angle, indicating a flat surface ( Figure 3). There were several local incidence angle images at each point. The zones of the point on different images maybe not the same. To avoid the joint influence and get pure results, we only analyzed the points which were in the same zone on all local incidence angle images. MAD and LE90 were calculated for each zone.  Figure 4 shows the spatial distribution of elevation and ICESat/GLAS points in four study areas. Figure 5 shows the quality indicators of the four assessed DEMs using box plots. Statistics of the indicators are in Table S1.   In each study area, the changing trend of different quality indicators' values was the same ( Figure 5). That is, if the LE90 of a certain DEM is larger than that of other DEMs, then the RMSE, STD, MAD and MED of the DEM are also larger than that of other DEMs. The number of ICESat/GLAS points in Sichuan was significantly lower than that in the other three areas (Table S1). For all study areas and DEMs, except for a few cases, ME were positive. The RMSE and STD were very close. The 12-and 30-m TanDEM-X DEM showed the best quality, with MAD varying from 0.27 to 4.25 m and LE90 varying from 0.99 to 13.79 m. ASTER GDEM had the worst quality, with MAD varying from 2.81 to 8.51 m and LE90 varying from 8.68 to 23.74 m. The order of the quality of TanDEM-X DEM, ASTER GDEM and SRTM is consistent with their nominal accuracy ( Table 2). We compared the quality of X-band TanDEM-X DEM and C-band SRTM with a pixel size of 30 m obtained by InSAR technology, finding that the TanDEM-X DEM had higher quality. In Sichuan and Xinjiang B, the LE90 of the 30-m TanDEM-X DEM was about 0.6 m higher than that of the 12-m TanDEM-X DEM, indicating that the quality of the 30-m TanDEM-X DEM was worse. Xinjiang A and Inner Mongolia have lower elevation, and the reference points are mainly distributed in flat areas. In these two areas, the LE90 of 12-m TanDEM-X DEM and 30-m TanDEM-X DEM were very close. Among the four areas, the quality indicators of Sichuan were the highest, which meant that the vertical quality of DEMs in Sichuan was the worst. DEMs in Xinjiang B had the second-worst quality. DEMs in Xinjiang A and Inner Mongolia had better quality. In each study area, the changing trend of different quality indicators' values was the same ( Figure 5). That is, if the LE90 of a certain DEM is larger than that of other DEMs, then the RMSE, STD, MAD and MED of the DEM are also larger than that of other DEMs. The number of ICESat/GLAS points in Sichuan was significantly lower than that in the other three areas (Table S1). For all study areas and DEMs, except for a few cases, ME were positive. The RMSE and STD were very close. The 12-and 30-m TanDEM-X DEM showed the best quality, with MAD varying from 0.27 to 4.25 m and LE90 varying from 0.99 to 13.79 m. ASTER GDEM had the worst quality, with MAD varying from 2.81 to 8.51 m and LE90 varying from 8.68 to 23.74 m. The order of the quality of TanDEM-X DEM, ASTER GDEM and SRTM is consistent with their nominal accuracy ( Table 2). We compared the quality of X-band TanDEM-X DEM and C-band SRTM with a pixel size of 30 m obtained by InSAR technology, finding that the TanDEM-X DEM had higher quality. In Sichuan and Xinjiang B, the LE90 of the 30-m TanDEM-X DEM was about 0.6 m higher than that of the 12-m TanDEM-X DEM, indicating that the quality of the 30-m TanDEM-X DEM was worse. Xinjiang A and Inner Mongolia have lower elevation, and the reference points are mainly distributed in flat areas. In these two areas, the LE90 of 12-m TanDEM-X DEM and 30-m TanDEM-X DEM were very close. Among the four areas, the quality indicators of Sichuan were the highest, which meant that the vertical quality of DEMs in Sichuan was the worst. DEMs in Xinjiang B had the second-worst quality. DEMs in Xinjiang A and Inner Mongolia had better quality. Figure 6 shows the spatial distribution of slopes in four study areas. In Sichuan, there were more than 5000 reference points distributed in areas where the slopes are >25 • , accounting for about 78% of the total reference points. Only seven points were distributed in areas with slopes <2 • . In Xinjiang A, there were more than 10,000 reference points distributed in areas with slopes <7 • , accounting for about 92% of the total points. Fewer than 40 points were distributed in areas with slopes >25 • . In Xinjiang B, there were about 75% of the points distributed in areas with slopes >15 • . In Inner Mongolia, about 68% of the points were distributed in areas with slopes <7 • . The slope interval where the number of pixels was less than ten was considered invalid. We used two quality indicators, MAD and LE90, to visually display the change of quality of DEMs with slopes ( Figure 7). Statistics of the indicators are in Table S2.  Figure 6 shows the spatial distribution of slopes in four study areas. In Sichuan, there were more than 5000 reference points distributed in areas where the slopes are >25°, accounting for about 78% of the total reference points. Only seven points were distributed in areas with slopes <2°. In Xinjiang A, there were more than 10,000 reference points distributed in areas with slopes <7°, accounting for about 92% of the total points. Fewer than 40 points were distributed in areas with slopes >25°. In Xinjiang B, there were about 75% of the points distributed in areas with slopes >15°. In Inner Mongolia, about 68% of the points were distributed in areas with slopes <7°. The slope interval where the number of pixels was less than ten was considered invalid. We used two quality indicators, MAD and LE90, to visually display the change of quality of DEMs with slopes ( Figure 7). Statistics of the indicators are in Table S2.   The peaks of indicators appeared where slopes are greater than 25° ( Figure 7). As the slope increased, MAD and LE90 tended to increase. The 30-m pixel size DEMs increased more drastically with increasing slopes than the 12-m pixel size TanDEM-X DEM. The MAD and LE90 of ASTER GDEM had the most dramatic increase in areas with steep slopes among the four DEMs. SRTM was the second most sensitive to slopes. When slopes were <25°, the indicators' values of TanDEM-X DEM with a pixel size of 12 and 30 m were very close. When slopes were >25°, the values were significantly different. The MAD and LE90 of 12-m pixel size DEM were lower than those of 30-m pixel size DEM. The same DEM had different quality in four study areas within the same slope range. The values of indicators in Sichuan and Xinjiang B were larger than those in Xinjiang A and Inner Mongolia under the same slopes. Figure S1 shows the spatial distribution of aspects in four study areas. MAD and LE90 were used to visually display the change of quality of DEMs with aspects ( Figure 8). TanDEM-X DEM had a larger value in NW and a smaller value in SW in four study areas. In Xinjiang A and Inner Mongolia, where slopes are generally small, the changes of AS-TER GDEM and SRTM with aspects were not obvious. In Sichuan and Xinjiang B, where have large fluctuations, MAD and LE90 were the largest near N and the smallest near S.  Figure S1 shows the spatial distribution of aspects in four study areas. MAD and LE90 were used to visually display the change of quality of DEMs with aspects ( Figure 8). TanDEM-X DEM had a larger value in NW and a smaller value in SW in four study areas. In Xinjiang A and Inner Mongolia, where slopes are generally small, the changes of ASTER GDEM and SRTM with aspects were not obvious. In Sichuan and Xinjiang B, where have large fluctuations, MAD and LE90 were the largest near N and the smallest near S. Figure 9 shows the spatial distribution of land cover in four study areas. MAD and LE90 were used to visually display the change of quality of DEMs with land cover under different slopes (Figure 10). When there were less than 10 points in the conditional interval, we did not count the interval's information. The quality of DEMs decreased with the increasing slope under the same land cover.

Quality versus Land Cover
There was no shrubland in Sichuan, so only forest and cropland/grassland/bareland were compared. Sixty-two percent of the reference points were in the forest and 38% of the points were in the cropland/grassland/bareland. In each slope range of Sichuan, the LE90 of the other three DEMs in the forest were higher than those in the cropland/grassland/bareland, except for ASTER GDEM.  Figure 9 shows the spatial distribution of land cover in four study areas. MAD and LE90 were used to visually display the change of quality of DEMs with land cover under different slopes (Figure 10). When there were less than 10 points in the conditional interval, we did not count the interval's information. The quality of DEMs decreased with the increasing slope under the same land cover.

Quality versus Land Cover
There was no shrubland in Sichuan, so only forest and cropland/grassland/bareland were compared. Sixty-two percent of the reference points were in the forest and 38% of the points were in the cropland/grassland/bareland. In each slope range of Sichuan, the LE90 of the other three DEMs in the forest were higher than those in the cropland/grassland/bareland, except for ASTER GDEM.   Figure 9 shows the spatial distribution of land cover in four study areas. MAD and LE90 were used to visually display the change of quality of DEMs with land cover under different slopes ( Figure 10). When there were less than 10 points in the conditional interval, we did not count the interval's information. The quality of DEMs decreased with the increasing slope under the same land cover.
There was no shrubland in Sichuan, so only forest and cropland/grassland/bareland were compared. Sixty-two percent of the reference points were in the forest and 38% of the points were in the cropland/grassland/bareland. In each slope range of Sichuan, the LE90 of the other three DEMs in the forest were higher than those in the cropland/grassland/bareland, except for ASTER GDEM. The shapes of the polylines in the forest (Figure 10a) were very close to the Figure 7a. The values in Figure 7a were basically between the values of the two l types in Figure 10a. However, at very steep slopes, the values in Figure 7a we than the maximum value of the land cover types in Figure 10a.
There were no shrubland and forest in Xinjiang A. The polylines in Figur the same as those in Figure 7b. The overall law in Xinjiang B was similar to that in Only 20% of the points were in the forest and 80% of the points were in the cropla land/bareland, leading to the shapes of the polylines in the cropland/grassland (Figure 10c  The shapes of the polylines in the forest (Figure 10a) were very close to the shapes in Figure 7a. The values in Figure 7a were basically between the values of the two land cover types in Figure 10a. However, at very steep slopes, the values in Figure 7a were higher than the maximum value of the land cover types in Figure 10a.
There were no shrubland and forest in Xinjiang A. The polylines in Figure 10b are the same as those in Figure 7b. The overall law in Xinjiang B was similar to that in Sichuan. Only 20% of the points were in the forest and 80% of the points were in the cropland/grassland/bareland, leading to the shapes of the polylines in the cropland/grassland/bareland (Figure 10c) consistent with the shapes in Figure 7c. Inner Mongolia had all three land cover types, with 3% of the points (about 300 points) in the forest, 3% of the points in the shrubland and 94% of the points in the cropland/grassland/bareland. Therefore, the shapes of the polylines in the cropland/grassland/bareland (Figure 10d Table 3 shows the statistics of SRTM in the classified three zones. The zone where the number of points was less than 20 was considered invalid, so the statistics for slopes facing towards the radar beam in Xinjiang A and Inner Mongolia were not analyzed. The indicators' values for slopes opposite to the radar beam were the largest among the three zones. The value for slopes facing towards the radar was lower than that for the flat area in Sichuan, while the value for slopes facing towards the radar was higher than that for the flat area in Xinjiang B.  Table 3 shows the statistics of SRTM in the classified three zones. The zone where the number of points was less than 20 was considered invalid, so the statistics for slopes facing towards the radar beam in Xinjiang A and Inner Mongolia were not analyzed. The indicators' values for slopes opposite to the radar beam were the largest among the three   Figure S2 shows a spatial distribution of original differences between the three assessed DEMs with 30-m pixel size and the reference 12-m TanDEM-X DEM before co-registration without eliminating outliers. Negative values indicate that the assessed DEM is lower than the reference DEM, while positive values indicate that the assessed DEM is higher than the reference DEM. Table 4 shows the values of nine transformation parameters used in the co-registration process between the assessed DEMs and the reference DEM. Each parameter was nearly in the same order of magnitudes in different study areas for different assessed DEMs. In the vertical direction, the absolute values of T Z , m z and θ z of ASTER GDEM were the largest, implying the biggest vertical differences between ASTER GDEM and the reference DEM before co-registration, which was consistent with the results mentioned above.  Figure 12 shows the Q-Q plot. The vertical axis is the quantiles of elevation differences between the assessed DEMs and the reference DEM after co-registration. The horizontal axis is the normal quantiles. The red lines represent a normal distribution. All observations showed a strong deviation from the red line, indicating that the residuals had non-normal distribution even after removing the outliers.  The histograms of the differences before and after co-registration are presented in Figure 13. For each histogram, there is a corresponding bar chart showing statistical indicators of the differences. The histograms visually showed systematic errors between the assessed DEM and the reference DEM before co-registration. The x-coordinate at the peak of the histogram after co-registration was closer to zero than that before co-registration. The frequency near the peak increased after co-registration. Table S3 shows more detailed statistical results of the model-to-model comparison. The bar charts and the statistics showed decreases in the indicators' values after co-registration.

Quality versus Local Incidence Angle
We summarized the MAD and LE90 of the three assessed DEMs after co-registration in four study areas ( Figure S3). The co-registered TanDEM

Quality versus Slope
In Figure 6, red represents areas with higher slopes and green represents areas with lower slopes. The overall slopes of Sichuan are relatively high, followed by Xinjiang B. Inner Mongolia has a few high-slope areas. Most areas of Xinjiang A have small slopes. The spatial distribution of slopes is similar to that of the height differences ( Figure S2). We set the slope bin to one degree, and the slope interval where the number of pixels was less than ten was considered invalid. MAD and LE90 were adopted to display the change of DEMs' quality with slopes ( Figure 14). In every subgraph, the indicators when the slopes were below 50 • were drawn to more clearly see the changing trend in the range of 0-50 • degrees. We summarized the MAD and LE90 of the three assessed DEMs after co-registration in four study areas ( Figure S3). The co-registered TanDEM-X DEM with a pixel size of 30 m had the best quality, with MAD varying from 0.18 to 0.88 m and LE90 varying from In four study areas, the growth rates of MAD and LE90 suddenly increased at around 35 , but the values were still around tens of meters. The growth rates further increased at around 70 , where the values increased sharply. In the interval of 88 -89 in Sichuan, the values of LE90 even reached about 1000 m. In each area, the indicators' values of AS-TER GDEM were the largest and the values of 30-m TanDEM-X DEM were the smallest, which was consistent with the results mentioned above.  The whole valid slope interval of Xinjiang A was 0-70 • , and the maximum valid slope of the other three areas could reach to 80-90 • . The relationship between the quality of DEMs and slopes was apparent. As the slope increased, the MAD and LE90 of all DEMs increased, indicating a decrease in the elevation quality. This relationship was the same for all DEMs, but the rise rates for different DEMs were different. In Xinjiang A and Inner Mongolia, where the slopes were greater than 50 • , the indicators' values had apparent fluctuations. In Sichuan and Xinjiang B, there were no such fluctuations.
In four study areas, the growth rates of MAD and LE90 suddenly increased at around 35 • , but the values were still around tens of meters. The growth rates further increased at around 70 • , where the values increased sharply. In the interval of 88-89 • in Sichuan, the values of LE90 even reached about 1000 m. In each area, the indicators' values of ASTER GDEM were the largest and the values of 30-m TanDEM-X DEM were the smallest, which was consistent with the results mentioned above. Figure 15 shows the relationship between quality indicators and aspects. In each area, the assessed DEMs with 30-m pixel size had close values in eight aspects, especially in Xinjiang A and Inner Mongolia. Figure 16 displays the change of quality of DEMs with land cover under different slopes. The rules were similar to the results mentioned in Section 4.1.4. In addition, we found that, in each study area, the relationship between each DEM's quality and different land cover types had the rule that the indicators' values in very low or very high slopes were close, and the values in middle slopes were significantly different for different land cover types. In middle-slope places, the indicators' values in forest were the largest, followed by shrubland. The values in cropland/grassland/bareland were the smallest. Figure 15 shows the relationship between quality indicators and aspects. In each area, the assessed DEMs with 30-m pixel size had close values in eight aspects, especially in Xinjiang A and Inner Mongolia.  Figure 16 displays the change of quality of DEMs with land cover under different slopes. The rules were similar to the results mentioned in Section 4.1.4. In addition, we found that, in each study area, the relationship between each DEM's quality and different land cover types had the rule that the indicators' values in very low or very high slopes were close, and the values in middle slopes were significantly different for different land cover types. In middle-slope places, the indicators' values in forest were the largest, followed by shrubland. The values in cropland/grassland/bareland were the smallest.   Figure 16 displays the change of quality of DEMs with land cover under different slopes. The rules were similar to the results mentioned in Section 4.1.4. In addition, we found that, in each study area, the relationship between each DEM's quality and different land cover types had the rule that the indicators' values in very low or very high slopes were close, and the values in middle slopes were significantly different for different land cover types. In middle-slope places, the indicators' values in forest were the largest, followed by shrubland. The values in cropland/grassland/bareland were the smallest.

Overall Quality
We first discuss the results obtained with ICESat/GLAS points as the ref Since the accuracy of ICESat/GLAS points in mountainous regions is not hig points do not meet the data preprocessing requirements in Sichuan, causing a ber of points with low quality eliminated. Positive values of ME mean that t of these DEMs is generally higher than that of the reference. Part of the reas fluence of the vegetation canopy, where short-wavelength SAR or optical sate cannot penetrate [58]. This overestimation was noted in previous studies [32, TanDEM-X DEMs showed the best quality. The next-best DEM was SR GDEM had the worst quality. This reflects the superiority of InSAR technolog ing elevation in vegetation coverage areas compared with optical technolog be used to estimate the standard deviation more resistant to outliers in the d DEM-X DEM had the smallest MAD, meaning that it had fewer gross errors an robust [36]. C-band penetrates vegetation better than X-band. Therefore, the e tained by C-band SAR should be closer to the surface elevation than that ob

Overall Quality
We first discuss the results obtained with ICESat/GLAS points as the reference data. Since the accuracy of ICESat/GLAS points in mountainous regions is not high [25], many points do not meet the data preprocessing requirements in Sichuan, causing a large number of points with low quality eliminated. Positive values of ME mean that the elevation of these DEMs is generally higher than that of the reference. Part of the reason is the influence of the vegetation canopy, where short-wavelength SAR or optical satellite sensors cannot penetrate [58]. This overestimation was noted in previous studies [32,35,59,60].
TanDEM-X DEMs showed the best quality. The next-best DEM was SRTM. ASTER GDEM had the worst quality. This reflects the superiority of InSAR technology in obtaining elevation in vegetation coverage areas compared with optical technology. MAD can be used to estimate the standard deviation more resistant to outliers in the dataset. TanDEM-X DEM had the smallest MAD, meaning that it had fewer gross errors and was more robust [36]. C-band penetrates vegetation better than X-band. Therefore, the elevation obtained by C-band SAR should be closer to the surface elevation than that obtained by X-band SAR. However, the results of this study are the opposite of the theory. Part of the reason is that the fusion of multi-baseline and multi-aspect TanDEM-X DEMs improves the quality of the final TanDEM-X DEM product, which was also discussed by Liu et al. [58]. Because the acquisition time of the images used to generate the DEMs is different, we should also consider the effects of changes in vegetation and terrain over a period of time on experimental results.
In study areas with more undulating terrain, the overall quality of the 12-m TanDEM-X DEM was better than that of the 30-m TanDEM-X DEM. It means that downsampling will reduce the quality of DEM, especially in mountainous areas [35]. It also demonstrates the advantages of high-resolution DEM in describing terrain details [58]. In the other two study areas with the reference points mainly distributed in flat areas, the overall quality of the 12-and 30-m were very close. Hu et al. [35] presented similar findings.
Next, we discuss the results obtained with 12-m TanDEM-X DEM as the reference data. The fact that the residuals between DEM models had non-normal distribution was also observed in many previous studies [13,20,22]. The changes of the x-coordinate at the peak of the histogram and the frequency near the peak after co-registration reflect the reduction of systematic errors in the vertical direction [22] and the increase in the quality of the geographic location, respectively. The decreases in the indicators' values after coregistration confirmed the conclusions of previous studies [13,23] that systematic deviations could reduce the vertical quality of DEM. These also show that the co-registration method used in this study can be used to correct the systematic errors of these DEMs. That the co-registered 30-m TanDEM-X DEM had the best quality is not unexpected because 30 m data was produced from the 12 m data using the moving-window method.

Quality versus Slope
The slope is the main factor affecting the elevation measurements [34]. The slope can cause geometric distortion in the processing of optical and SAR images, which will affect the extraction of true elevation. We first discuss the results obtained with ICESat/GLAS points as the reference data. The position where the peaks of indicators appeared indicates that the slope impacts image acquisition. As the slope increased, the quality of DEMs tended to decline, meaning the slope strongly influenced the quality of DEMs produced by InSAR or optical photogrammetry technology. The reasons for this phenomenon may be: (1) As the slope increases, the surface becomes rougher. It is challenging to measure ground elevation with optical photogrammetry and InSAR technology. (2) High-slope areas are mainly vegetation, hindering optical photogrammetry and short-wavelength SAR from measuring the ground beneath the tree canopy. (3) When the slope varies greatly, the elevation has obvious fluctuations. The value of a pixel with a size of 12 or 30 m cannot accurately represent the elevation of the land within the pixel, which can cause a large vertical error. In the flat regions, the elevation has small fluctuations, so the pixel value can basically represent the elevation of the land within the pixel. The quality of DEMs decreased with the increasing slope, which had been proven by many previous studies [61,62]. The 30-m DEMs increased more drastically with increasing slopes than the 12-m DEM, which meant that DEMs with higher resolution had a better tolerance for slope changes. When slopes were >25 • , the indicators' values of TanDEM-X DEM with a pixel size of 12 and 30 m were significantly different, which confirmed that higher-resolution DEM could better describe terrain slopes. Previous studies also made similar findings [35]. The quality of ASTER GDEM had the most dramatic decline in areas with steep slopes among the four DEMs, meaning ASTER GDEM's vertical quality was most affected by the slope.
The same DEM had different quality in four study areas within the same slope range. The reasons may be: (1) The main land cover types in Sichuan and Xinjiang B are vegetation, while Xinjiang A and Inner Mongolia are mainly bareland. Therefore, the effects of land cover may cause this situation. (2) The slope value of the pixel is not very accurate in Sichuan and Xinjiang B with large slope changes. The pixel value may underestimate the true slope, making the classification of the errors inaccurate.
Next, we discuss the results obtained with 12-m TanDEM-X DEM as the reference data. The phenomenon that the indicators' values had apparent fluctuations in Xinjiang A and Inner Mongolia could be related to the too few (although > 10) pixels in certain high slope bins, causing the laws in these slope bins not representative enough. In Sichuan and Xinjiang B, where had adequate pixels at high slope bins, there were no such fluctuations. In the interval of 88-89 • in Sichuan, the extremely high values indicated that the quality of DEMs was seriously affected at extremely steep places. Since the 30-m TanDEM-X DEM was generated by resampling the 12-m TanDEM-X DEM, these results also show that the quality of resampling results is affected by the slope.

Quality versus Aspect
The results that the quality of DEM varies with the aspect when taking ICESat/GLAS points as the reference are similar to the previous findings [24]. The reason for the differentiation of the elevation quality with the aspect may be related to the heading of the satellite sensors in the ascending and the descending orbit [63] and the incidence angle of the radar sensor to the surface [51,57].
The strong relationship between aspects and DEM's quality is an indicator of data misregistration [33,64]. As a result, this rule reflected that the co-registration results between DEMs were good, but high slopes and dense vegetation could influence the co-registration effects.

Quality versus Land Cover
We first discuss the results obtained with ICESat/GLAS points as the reference data. The indicators' values of the other three DEMs in the forest were higher than those in the cropland/grassland/bareland, except for ASTER GDEM. It shows that in places with denser vegetation coverage, it is more difficult for the InSAR band to penetrate vegetation, which makes it challenging to obtain the true surface elevation. As a result, the quality of the DEMs in the forest is lower than that in the cropland/grassland/bareland. The quality of ASTER GDEM in the forest was higher than that in the cropland/grassland/bareland on a flat surface. This may be caused by ASTER GDEM's low sensitivity to land cover, which was also mentioned by previous studies [14]. There is an interval of the acquisition time between the DEMs and the land cover map, leading to a slight change of the actual local land cover, which may also affect the results.
In Sichuan, the shapes of the polylines in the forest (Figure 10a) were very close to the shapes in Figure 7a. The reason should be that there are more reference points in the forest, which dominates the trend of all points changing with slopes in Sichuan. The reason the values in Figure 7a were higher than the maximum value of the land cover types in Figure 10a at very steep slopes is that, before calculating the indicators, we used the three-sigma rule to eliminate the outliers. For Figure 7a, we removed the outliers in each slope range. For Figure 10a, we removed the outliers in each slope range under every land cover, thus removing more outliers. Therefore, the quality at very steep slopes was improved. That the effect of different land covers on the quality of DEMs was not obvious in Inner Mongolia may be related to the few points in forest and shrubland. Only 10 s to 100 points were distributed in those two land covers under every slope range, which made the law not obvious.
Next, we discuss the results obtained with 12-m TanDEM-X DEM as the reference data. For different land cover types, the indicators' values in very low or very high slopes were close, and the values in middle slopes were significantly different. This showed that, in flat and extremely steep places, the land cover had a small impact on vertical differences, while, in the middle slope places, the land cover had a great impact on DEM's quality. It should be noted that the reference DEM, such as the three assessed DEMs, is also DSM, not DTM. In very steep places, the reason the quality of DEMs under different land cover was close should be that the slopes dominated the quality [34], where the impacts of land cover on DEMs' differences became little.

Quality versus Number of Images
A larger number of images could contribute to the increasing quality of the final DEM products. For TanDEM-X DEM and SRTM, the number of images indicates the number of valid height values from different InSAR DEM acquisitions available for being fused to generate the final DEM product. In general, TanDEM-X DEM has a larger number of images than SRTM, which may be one of the reasons that TanDEM-X DEM owns a higher quality. For TanDEM-X DEM, the maximum number of images in Sichuan and Xinjiang B is larger than that in the other two study areas. This may be because Sichuan and Xinjiang B have obvious fluctuations. The previous study has similar findings that the most problematic areas are those with only one valid acquisition of TanDEM-X DEM [25]. For ASTER GDEM, the number of images represents the number of images stacked to derive the final elevation values. The previous study [35] also found that the quality of ASTER GDEM got higher with increasing stack numbers.

Quality versus Local Incidence Angle
In quality versus aspect analysis, the aspects were simply divided into eight directions, starting from true north (Section 4.1.3). By classifying the aspect into three zones according to the relationship between the local incidence angle and the look angle, the results obtained were clearer. The data opposite to the sensors have the lowest quality, which was found by the previous study on SRTM-X [15] and DEMs generated from RADARSAT stereopairs [65]. In Sichuan, the quality for slopes facing towards the radar was higher than that for the flat area, while this relationship was just the opposite in Xinjiang B. This may be partly because that the quality of ICESat/GLAS points in mountainous regions is not high [25]. This method could provide a new perspective for the quality assessment of SRTM and other DEMs whose incidence angle files are available. This also reminds us that researchers who want to generate a high-quality DEM by fusion of InSAR-derived intermediate DEMs could set low weights for places opposite to the sensors.

The Reference Data
The 30-m data were downsampled from 12-m resolution DEM using the movingwindow smoothing process. Thus, when taking 12-m TanDEM-X DEM as the reference, that the 30-m TanDEM-X DEM had the best quality is not unexpected. Although the results are consistent with our expectations, the achieved quality of 30-m TanDEM-X DEM is still useful. Because these results can reflect that the downsampling process of TanDEM-X DEM is influenced by slope and landcover, and the 30-m TanDEM-X DEM agree better with the reference DEM after co-registration. It should be noted that the rather low elevation differences and unrealistic quality of 30-m TanDEM-X DEM when taking 12-m TanDEM-X DEM as the reference is because they originate from the same data source.
It might be a little confusing why we used the reference ICESat/GLAS points to assess the quality of another reference 12-m TanDEM-X DEM. This does not conflict with our objective that we want to provide some additional insights on the differences' performance in relation to various influencing factors. Through the consistency performance between 12-m TanDEM-X DEM and ICESat points, we can observe the influence of various factors on the differences between these two data, which is exactly what this article studies. It is also helpful for us to better understand the quality obtained with 12-m TanDEM-X DEM as the reference.

Data Co-Registration
In this study, we used two kinds of reference data for DEMs' quality assessment and comparison. In model-to-model comparison, we did co-registration between models using an improved LZD method to eliminate systematic errors. However, we did not perform co-registration when taking ICESat/GLAS points as reference data. This is because: (1) The diameter of the ICESat/GLAS spot can reach 70 m, and the distance between adjacent spots is about 172 m. By contrast, the possible systematic offsets (see Table 4) are minimal. As a result, co-registration between ICESat/GLAS points and DEMs with 12-and 30-m pixel size is not necessary. (2) The reference points we used are only ICESat/GLAS points, with no other GNSS points, causing the small number and small distribution area of reference points, which can reduce the effect of co-registration. In contrast, the reference DEM is a continuous surface, which is very suitable for co-registration.
It should be noted that the co-registration solves the systematic errors (not only planimetric systematic errors but also vertical systematic errors), so the quality of the evaluated product is not the one that the user of the product will really have. Although the quality analysis of DEMs mainly focused on their vertical quality in this study, we still need to pay attention to the systematic errors. The statistics of nine transformation parameters used in co-registration between the assessed DEMs and the reference DEM (Table 4) show the systematic errors existed in 3D direction, which the researchers need to notice to make a real evaluation of each DEM.

Limitations of Our Study
The source data of the DEMs and the land cover map were obtained at different times, so the vegetation growth and topographic changes during this period would also affect the quality assessment [35]. To minimize the potential impact changing with time, we visually inspected each obvious land cover change in Google Earth, referring to Hawker et al. [24], and confirmed that no major changes occurred. However, small changes in local places could affect the experimental results. Future research should pay attention to this situation.
When exploring the relationship between land cover and DEM's quality, we assessed the quality under different land cover in each slope range to reduce the impact of slope on results. Future studies could also pay attention to the conjoint analysis of slope/aspect.

Conclusions
There are many types of global DEMs distributed by different agencies, and the elevations of different DEMs are not the same. DEM's quality affects the results of subsequent applications. SRTM and ASTER GDEM are the most commonly used global DEMs, and TanDEM-X DEM has attracted users' attention due to its unprecedented accuracy. TanDEM-X DEM and SRTM are InSAR products, and ASTER GDEM is a photogrammetric product. To provide some additional insights on quality assessment of 12-and 30-m resolution TanDEM-X DEMs, 30-m resolution ASTER GDEM and 30-m resolution SRTM, this study assessed differences' performance in relation to geographical features and the ways in which DEMs have been created on selected Chinese sites, taking ICESat/GLAS points with 14-cm absolute vertical accuracy but size of 70-m diameter and 12-m resolution TanDEM-X DEM with less than 10-m absolute vertical accuracy as the reference data.
Differences' performance shows good consistency under the assessment using two reference data. Results show that: (1) TanDEM-X DEMs have the best overall quality, followed by SRTM. ASTER GDEM has the worst quality. InSAR technology has superiority in obtaining elevation in vegetation coverage areas compared with optical technology. The fusion of multi-baseline and multi-aspect TanDEM-X DEMs improves the quality of the final TanDEM-X DEM product. The 12-m TanDEM-X DEM has significant advantages in describing terrain details. Downsampling will reduce the quality of DEM, especially in mountainous areas. (2) The slope can cause geometric distortion in the processing of optical and SAR images, which can affect the extraction of true elevation. The quality of DEMs decreases with increasing slopes. ASTER GDEM's vertical quality is most affected by the slope. SRTM is the second most sensitive to slopes. The 12-m TanDEM-X DEM has a better tolerance for slope changes. (3) TanDEM-X DEM has a larger value in NW and a smaller value in SW. In places where slopes are generally small, the changes of ASTER GDEM and SRTM with aspects are not obvious. In places of large fluctuations, the quality of ASTER GDEM and SRTM are the lowest near N and the highest near S. After the co-registration, the assessed 30-m resolution DEMs have close quality in eight aspects, especially in areas where slopes are generally small. High slopes and dense vegetation could influence co-registration effects. (4) For TanDEM-X DEM and SRTM, the denser is the vegetation cover, the lower is the quality of DEMs because, in places with denser vegetation coverage, it is more difficult for the InSAR band to penetrate vegetation, which makes it challenging to obtain the true surface elevation. ASTER GDEM has low sensitivity to land cover. If there are more reference points in a certain land cover, the shapes of the polylines describing the change of quality with slope under that land cover are consistent with the shapes of the polylines describing the change of quality with slope under all land cover types. (5) The DEMs were created from the fusion of many observations. The quality of DEMs gets higher with the increasing number of images used in the fusion process. (6) For SRTM, which is created from SAR images, the local incidence angle can impact the quality of the DEM as the geometry of the incidence angle strongly influences the quality of the reflected radar beam. The quality in places where the slopes are opposite to the radar beam is the worst. This method could provide a new perspective for the quality assessment of SRTM and other DEMs whose incidence angle files are available. Researchers who want to generate a high-quality DEM by fusion of InSAR-derived intermediate DEMs could set low weights for places opposite to the sensors. (7) The residuals between DEM models had non-normal distribution even after co-registration. The quality of the assessed DEMs increased after co-registration. Each transformation parameter using for co-registration is nearly in the same order of magnitudes in different study areas for different assessed DEMs.
In this study, our results concluded from study areas in China are similar to the results from other areas according to previous studies. Researchers who want to know the quality of a DEM in order to use it in further applications should pay more attention to the terrain factors and land cover in their study areas and the ways in which the DEM has been created. It should be noted that the statistics in the study are influenced by tree heights, especially in hilly and mountainous places. In model-to-model comparison, the quality results of the assessed three 30-m resolution DEMs are restricted by the quality of the reference 12-m resolution TanDEM-X DEM, which is the best we could utilize at this time. Further evaluation is necessary once finer and better reference DEM becomes available. The improved LZD method applied to co-registration is one of our highlights. However, this paper does not focus on the comparison of the co-registration results between our method and other methods for these global DEMs, which is also lacking in the current research. We only describe the reasons and advantages of using the LZD method from a theoretical level. Finally, with more global DEMs distributed, increasing the types of assessed DEMs is necessary for future studies.  Table S3: Statistics of the model-to-model comparison before and after coregistration, Figure S1: The spatial distribution of aspects in four study areas: (a) Sichuan; (b) Xinjiang A; (c) Xinjiang B; and (d) Inner Mongolia, and the distribution of ICESat/GLAS points, Figure S2: Spatial distribution of differences between the assessed DEMs with 30-m pixel size and the reference 12-m TanDEM-X DEM before co-registration without eliminating outliers in: (a) Sichuan; (b) Xinjiang A; (c) Xinjiang B; and (d) Inner Mongolia. Figure S3: The MAD and LE90 of three DEMs after co-registration.