Applicability of Data Acquisition Characteristics to the Identiﬁcation of Local Artefacts in Global Digital Elevation Models: Comparison of the Copernicus and TanDEM-X DEMs

: Several global digital elevation models (DEMs) have been developed in the last two decades. The most recent addition to the family of global DEMs is the TanDEM-X DEM. The original version of the TanDEM-X DEM is, however, a nonedited product (i.e., it contains local artefacts such as voids, spikes, and holes). Therefore, subsequent identiﬁcation of local artefacts and their editing is necessary. In this study, we evaluated the accuracy of the original TanDEM-X DEM and its improved edited version, the Copernicus DEM, in three major European mountain ranges (the Alps, the Carpathians, and the Pyrenees) using a digital surface model derived from airborne laser scanning data as a reference. In addition, to evaluate the applicability of data acquisition characteristics (coverage map, consistency mask, and height error map) and terrain characteristics (slope, aspect, altitude) to the localization of problematic sites, we modeled their associations with the TanDEM-X DEM error. We revealed local occurrences of large errors in the TanDEM-X DEM that were typically found on steep ridges or in canyons, which were largely corrected in the Copernicus DEM. The editing procedure used for the Copernicus DEM construction was evidently successful as the RMSE for the TanDEM-X and Copernicus DEMs at the 90 m resolution improved from 45 m to 12 m, from 16 m to 6 m, and from 24 m to 9 m for the Alps, the Pyrenees, and the Carpathians, respectively. The Copernicus DEM at the 30 m resolution performed similarly well. The boosted regression trees showed that acquisition characteristics provided as auxiliary data are useful for locating problematic sites and explained 28–50% of deviance of the absolute vertical error. The absolute vertical error was strongly related to the height error map. Finally, up to 26% of cells in the Copernicus DEM were ﬁlled using DEMs from different time periods and, hence, users performing multitemporal analysis or requiring data from a speciﬁc time period in the mountain environment should be wary when using TanDEM-X and its derivations. We suggest that when ﬁlling problematic sites using alternative DEMs, more attention should be paid to the period of their collection to minimize the temporal displacement in the ﬁnal products.


Introduction
DEMs are an indispensable data source for various environmental analyses with multiple successful applications, for example, in geology [1], ecology [2], and hydrology [3]. With the growing number of global DEMs available, applications also include multitemporal analyses. Nowadays, global DEMs are increasingly combined for calculations of changes in the glacier mass [4], assessment of topographic changes caused, for example, by volcanic activity [5], or forest change detection [6].
(provided by DLR as auxiliary data); and (iii) to evaluate the suitability of acquisition and terrain characteristics for the localization of the TDX90 problematic sites.

Study Areas and Airborne LiDAR
Three study areas located in mountain ranges were selected, with the availability of highly accurate airborne laser scanning data for subsequent accuracy assessment being an additional selection criterion. We used a DSM as a reference as it is well-recognized that DEMs derived from SAR height measurement include trees, buildings, and other aboveground structures, thus not representing the bare ground elevation in vegetated and built-up areas [36,37]. However, we used digital terrain models (DTMs) to derive the slope and the aspect of the terrain. The study areas were as follows: the Alps, the Pyrenees, and the Tatra Mountains, which are part of the Carpathians (Figure 1). The study areas cover about 3265 km 2 (Alps), 7890 km 2 (Pyrenees), and 960 km 2 (Tatra Mountains), respectively, with altitudes ranging between 300-4800 m a.m.s.l. (Alps), 450-3400 m a.m.s.l. (Pyrenees), and 500-2700 m a.m.s.l. (Tatra Mountains), respectively. The airborne LiDAR data for Valle d'Aosta in the Italian Alps were collected in 2008 and the DSM derived from those data is freely available for downloading at the 2 m resolution. The airborne LiDAR data for Spain (Pyrenees) were collected between 2008 and 2015 and the resulting DSM is freely available for downloading at the 5 m resolution. The airborne LiDAR data for Slovakia (Tatra Mountains) were collected between 2017 and 2020 and the derived DSM is freely available for downloading at the 1 m resolution [38].

TanDEM-X, Copernicus, SRTM and NASA DEMs
The TanDEM-X data were collected between 2010 and 2015 [33]. The nominal TanDEM-X acquisitions were taken in the ascending orbits (right-looking observation mode) over the Northern hemisphere. To improve the accuracy in problematic areas such as high mountains, extra acquisitions with different geometries were taken (i.e., adjusting the angle of signal acquisition or using the descending orbits over the Northern hemisphere; [34,39]). The TanDEM-X DEM in the 3-arcseconds resolution (hereinafter referred to as TDX90) was downloaded from https://download.geoservice.dlr.de/TDM90/ (accessed on 9 August 2021). The horizontal and vertical datum of TDX90 is WGS84. Note that the voids present in TDX90 and the auxiliary data were removed and not used in this study. Other com-ponents used in this study and provided together with TDX90 as auxiliary data are the coverage map, the consistency mask, and the height error map. The coverage map (COV) indicates a number of valid acquisitions available for mosaicking. The consistency mask (COM) is an auxiliary dataset that indicates height inconsistencies between the individual acquisitions [33]. Two general types of height inconsistencies can be distinguished: (i) small absolute height differences exceeding the corresponding thresholds of height errors, e.g., due to temporal changes (COM2, COM10); and (ii) large absolute height differences arising, e.g., due to phase unwrapping errors or incoherent areas such as shadows or layovers (COM1, COM9). If all the height differences are smaller than a certain threshold, all the input height values are considered consistent (COM8). Note that the category "Smaller inconsistency" (COM2) was present only in a single cell in our study areas and was removed from the analysis. The height error map (HEM) represents the theoretical height error of each TDX90 cell as the standard deviation derived from geometrical considerations and interferometric coherence [8,10,33].
Other DEMs evaluated in this study are the Copernicus DEMs with the resolutions of 3 arcseconds and 1 arcsecond (hereinafter referred to as COP90 and COP30, respectively), which we downloaded from the FTP server cdsdata.copernicus.eu (registration required). The Copernicus DEMs were derived from the TanDEM-X data and were subjected to terrain correction and hydrological editing, including automatic operations as well as manual control. Terrain correction consisted of spikes and holes removal, identification and filling of voids (small voids were interpolated, larger voids-infilled with an alternate DEM), and identification and correction of systematic and random biases [26]. The vertical datum for the Copernicus DEMs is an EGM2008 geoid and the horizontal datum is WGS84 [32].
Besides TDX90 and COP90, which are the main focus of this study, we also used SRTM DEM v3 version (void-filled) with the resolution of 90 m (hereinafter referred to as SRTM90) to provide a direct comparison with this widely used spaceborne DEM. The SRTM data were acquired with a C-band radar sensor attached to Space Shuttle Endeavour during its 11-day-long mission in February 2000 [7]. Each terrain segment was recorded on ascending and descending orbit passes. After its first release in June 2003, SRTM became the most commonly used global DEM. In addition, voids and larger problematic sites in TDX90 were filled using SRTM [26,30,31], and for this reason, it is used as an additional reference in this study. In addition, we also used NASADEM at the 30 m resolution (hereinafter referred to as NASA30), which was recently produced by reprocessing the SRTM data and can be considered as an alternative to COP30 [40]. The vertical datum for both SRTM90 and NASA30 is an EGM96 geoid and the horizontal datum is WGS84. We downloaded the SRTM and NASA30 from the Earth Explorer data portal.
Note that the short-wavelength C-band or X-band radar used for acquisition of the abovementioned DEMs is unable to penetrate the vegetation canopy and reach the bare ground; most of the incoming signals are reflected by various scatterers in the upper part of the canopy [41]. Consequently, the elevation values captured by the sensor are located somewhere between the ground and the top of the vegetation canopy. Hence, the TanDEM-X, SRTM, and NASA DEMs are rather DSMs which represent the surface of the Earth including vegetation, buildings, and infrastructure [42][43][44].

Accuracy Assessment and Error Modeling
All the datasets were first transformed into the same vertical and horizontal datum. To match the horizontal datum of the spaceborne DEMs and reference LiDAR models, we transformed all the models using the bilinear resampling method into the same UTM zone. LiDAR DSMs were aggregated using the mean value to the same resolution. The vertical datum of all the datasets is represented as orthometric heights except for TDX90, which represents ellipsoidal heights. In order to match the ellipsoidal heights of TDX90 with the orthometric heights of other models, we used an Earth Gravitational Model (EGM2008).
We calculated height differences between the reference LiDAR DSM and the spaceborne DEMs. We used the differences to calculate the mean error (ME) and the root mean square error (RMSE) expressed as follows: where n is the total number of sampled cells, h DEMi is the ith height from the global DEM, and h REFi is the corresponding height from the LiDAR DSM. The RMSE measures the mean square magnitude of errors. This measurement gives more weight to large deviations such as outliers. Therefore, in addition, we also calculated robust metrics (less affected by outliers [45,46]); namely, we calculated the absolute deviation at the 90% quantile (LE90) and the mean absolute error expressed as follows: where n is the total number of sampled cells, h DEMi is the ith height from the global DEM, and h REFi is the corresponding height from the LiDAR DSM. LE90 was calculated as the 90th percentile of the manually sorted absolute height differences (i.e., 90% of the differences are less than or equal to this value).
In addition, we aimed at evaluating the suitability of acquisition and terrain characteristics for the localization of the TDX90 vertical error. We combined the auxiliary data (HEM, COM, and COV) and the terrain characteristics (slope, aspect, and altitude) to model the absolute vertical error (the difference between TDX90 and LiDAR DSM). The terrain slope and aspect were derived from a LiDAR digital terrain model at the 90 m resolution (i.e., we first aggregated the digital terrain model to the 90 m resolution using the mean values and then calculated the terrain characteristics) [47,48]. We used Horn's algorithm with a 3 × 3 cell neighborhood implemented in ArcGIS (version 10.8.1).
We evaluated the relationships between the differences (absolute values) and the explanatory variables using boosted regression trees (BRT). We used functions provided by Elith et al. (2008) [49] and R package gbm version 2.1.5. [50]. The models were fitted with the maximum number of trees of 5000, the learning rate of 0.05 (controls the rate at which the model complexity is increased; small values allow the model to grow more trees), the bag fraction of 0.5 (the proportion of the data used when selecting the optimal tree number), and without interaction terms (i.e., the tree complexity of 1). The absolute vertical error was modeled specifying the Gaussian error distribution. To estimate the optimal number of trees, we used a fivefold cross-validation. This consisted of randomly dividing the dataset into five independent partitions, calibrating the model on the basis of four partitions and, subsequently, validating it with the remaining one. This procedure was repeated five times, each time leaving out another partition. At each iteration, the residual deviance was calculated and the number of trees giving the best model (i.e., the lowest residual deviance) was identified [49]. The overall performance was assessed using the total deviance explained (i.e., the proportion of the variance explained by the model), which was calculated by dividing the difference between the mean total deviance and the estimated fivefold cross-validated residual deviance by the mean total deviance. In order to estimate the utility of the variables for the absolute vertical error modeling, we calculated the relative importance of each variable (i.e., the contribution of each variable to the model fit [49]). The importance of each variable was scaled so that the sum adds to 100 and represents how much the model was improved by including that particular variable. The models were fitted in R, version 3.6.0 (R Development Core Team).

Results
3.1. Absolute Vertical Accuracy of TDX90, COP90, COP30, SRTM90, and NASA30 First, we performed an overall accuracy assessment of all the DEMs (TDX90, COP90, SRTM90, COP30, and NASA30). The mean differences between the reference DSM and the spaceborne DEMs were small-up to ± 2 m, except for TDX90 in the Alps (Table 1). In terms of LE90, the absolute vertical accuracy of TDX90 was the highest in the Pyrenees, followed by the Tatra mountains and the Alps. The differences between the RMSE and the MAE are the most striking features of this comparison (Table 1). While according to the RMSE, SRTM is a better model than TDX90 in all areas, the situation is reversed when using the MAE (which is less affected by outliers). Note also that the difference between the RMSE and the MAE is considerably higher for TDX90 than for SRTM90. This indicates that TDX90 contains many cells with a relatively large error (i.e., artefacts such as spikes and holes). Importantly, the absolute vertical accuracy of COP90 is better than that of TDX90 in all the study areas (and it also outperforms SRTM90), which shows that the editing process was successful. When comparing DEMs with the 30 m resolution, COP30 mostly outperformed NASA30. Our results show a noticeable effect of the terrain slope on the vertical accuracy of all the evaluated DEMs (Table 2), with the increasing slope leading to decreasing accuracy. All the models tended to slightly underestimate or overestimate the reference DSM throughout all the slope categories, with the notable exception of the steepest slopes (>40 • ) where TDX90 considerably underestimates the reference DSM. In terms of LE90, TDX90 had a higher accuracy than SRTM for all the slope categories, except for the steepest slopes (>40 • ). When using the MAE, on the other hand, the accuracy of TDX90 and SRTM was more or less the same in the steepest category (>40 • ). The high difference between LE90 and the MAE implies a high presence of outliers in TDX90 in this category (i.e., slopes >40 • ). COP90 had a similar or better accuracy than TDX90 in all the evaluated categories of slopes regardless of the validation metric and performed better than SRTM90. As for the models at the 30 m resolution, COP30 was more accurate than NASA30 throughout all the slope categories. The terrain aspect also affects the global DEMs' vertical accuracy. The highest variation of the accuracy with respect to the terrain aspect was observed in the Alps, where LE90 of TDX90 varied from 47.48 m (in the areas with the 90-112.5 • aspect) to 7.92 m (in the areas with the 337.5-360 • aspect). In the Alps, LE90 of TDX90 was higher in the areas with east-and west-facing slopes and lower in the areas with north-and south-facing slopes ( Figure 2). LE90 varied with the aspect by up to 9.42 m and 4.41 m for COP90 and SRTM90 and by up to 12.18 m and 4.76 m for COP30 and NASA30, respectively. The strong dependence of accuracy on the aspect in the Alps is a coincidence of several factors rather than a generally presented pattern in TDX90. Indeed, in the remaining areas (the Pyrenees, the Tatra Mountains), the effect of the aspect on the vertical accuracy was much smaller compared to the Alps (Figure 2). The first factor contributing to this phenomenon is that the study area of the Alps has steeper slopes than the other areas. The other factor is the generally lower number of acquisitions compared to the other areas. For example, the maximum number of acquisitions in the Alps was eight, compared to as much as thirteen in the Pyrenees. This resulted in large outliers concentrated in a few places with mainly east and west orientation and, hence, in the inflation of LE90 (see Figure 3 for the map of the slope, the aspect, and the absolute vertical error of TDX90 in the Alps' study area).  Table 90. quantile (LE90) as a function of the terrain aspect when compared with LiDAR DSMs in the Alps (left), the Pyrenees (middle), and the Tatra mountains (right). Figure 3. Slope, aspect, and absolute vertical error of the TanDEM-X DEM and the fill source of COP90 in the Alps' study area. Note that largest absolute vertical errors mostly coincide with areas of very steep slopes of east-and west-orientation. Note the correspondence between sites of high vertical error and COP90 sites, which were filled with external reference data (filling mask).

Visual Inspection of the Most Problematic Sites
In addition to the above analyses, we also assessed the improvement in COP90 compared to TDX90 visually (Podobnikar, 2009). TDX90 represents the topography well in most places but it fails in some parts (see examples of such problematic parts in Figure 4). Note that in these sites, COP90 and SRTM90 correspond with the LiDAR DSM much better than TDX90. In general, the largest errors are observed on steep slopes, sharp ridges, and in deep valleys, which is expected as the mountain environment is challenging for SAR acquisitions (Toutin, 2002). In the Alps (Valle d'Aosta), artefacts are rather scattered on individual ridges (for example, near the Becca di Luseney mountain). In the Pyrenees, few ridges show similar errors as in the Alps; the most obvious problem is, however, the absence of the Cañón de Añisclo. The highest concentration of artefacts can be found in the highest part of the Tatra mountains (around its highest peak Gerlachovský štít) where multiple peaks and ridges are poorly represented in TDX90 (Figure 4).

Association of the Accuracy of TDX90 with Auxiliary Characteristics (COM, COV)
First, we assessed the vertical accuracy of TDX90 with respect to the height inconsistencies among the contributing scenes. We also included COP90 and SRTM90 to provide a comparison, but we did not include COP30 and NASA30 in this part of evaluation to avoid influence of the horizontal resolution on the vertical error. TDX90 showed great accuracy for pixels in the COM8 category (all heights were consistent), in which LE90 was between 3.6 m and 5.7 m. COP90 and TDX90 showed similar accuracies in terms of LE90, but in terms of the RMSE, COP90 showed better accuracy than TDX90, indicating that some minimum editing in this category was successfully performed (Table 3). However, only less than 20% of cells in our study areas fall into this category. Most of the cells in our study areas fall into the categories COM9 and COM10 (categories indicating larger or smaller inconsistency, respectively, with at least one consistent height pair). These categories are associated with lower accuracy of TDX90 than the "All heights are consistent" category (COM8). LE90 of the cells in the category COM9 for TDX90 and COP90 ranged between 10.3 m and 41.9 m and between 9.2 m and 17.8 m, respectively, and LE90 of the cells in the category COM10 ranged between 6.6 m and 9.8 m and between 7.0 m and 9.7 m, respectively. It is, therefore, evident that editing of the cells was required-particularly in the COM9 category-and that it was successful as LE90 is considerably lower for COP90 than for TDX90 (Table 3). Table 3. Accuracy measures for TDX90, COP90, and SRTM90 with respect to the consistency mask (COM) in comparison with respective LiDAR DSMs. The classification of all the data into the COM categories was based on the TanDEM-X consistency mask. COM1: larger inconsistency, COM4: only one coverage, COM8: all heights are consistent, COM9: larger inconsistency but at least one consistent height pair, COM10: smaller inconsistency but at least one consistent height pair. A very small proportion of cells (less than 1% in all the study areas) fell into the categories "Only one coverage" (COM4) or "Larger inconsistency" (COM1). Most cells falling into these categories were found in the study area of the Alps and, despite their relatively low proportion, they can still encompass tens of square kilometers. LE90 of these categories for TDX90 and COP90 ranged between 225.4 m and 558.6 m and between 53.5 m and 78.9 m, respectively. Thus, again, the improvement in accuracy and successful editing of COP90 is evident. Despite this improvement, however, the accuracy of COP90 in the categories COM1 and COM4 was considerably lower than in the categories COM8, COM9, and COM10, respectively (Table 3). This is likely due to the fact that for filling in the problematic sites in TDX90, SRTM was predominantly used, and the accuracy of SRTM90 in such cells is also low. The poor accuracy of both these models in such cells indicates that they represent locations that are likely problematic for synthetic aperture radar (SAR) interferometry in general.

Coverage
The number of acquisitions ranged between 2 and 8 in the Alps, between 2 and 13 in the Pyrenees, and between 3 and 10 in the Tatra Mountains, respectively. Therefore, we assessed the role of the number of acquisitions within the consistency mask categories COM8, COM9, and COM10 (the results for COM1 and COM4 are not shown as they have poor accuracy anyway, see above). Our results show that with an increasing number of acquisitions, accuracy within individual consistency mask categories (COM8, COM9, and COM10) generally increases. Cells falling within the "All heights are consistent" category (COM8) with at least three valid acquisitions (COV3) provide excellent accuracy; in these categories, LE90 of TDX90 ranged among the COV categories between 3.8 m and 6.5 m in the Alps, between 3.9 m and 6.5 m in the Pyrenees, and between 2.5 m and 5.1 m in the Tatra Mountains, respectively (Table 4). Cells in the category "Smaller inconsistency but at least one consistent height pair" (COM10) with a minimum of three valid acquisitions (COV3) also show a relatively good accuracy; depending on the study area, LE90 was slightly higher or lower than 10 m ( Table 4). The most problematic cells of this subgroup in terms of accuracy are those with larger inconsistencies but at least one consistent height pair (category COM9). In particular, the combination of large inconsistencies (COM9) and a relatively low number of acquisitions (COV < 6) results in poor accuracy. Importantly, we observed a considerable improvement in COP90 over TDX90 in almost all the COM/COV categories (Table 4).

Utility of Auxiliary Data and Terrain Characteristics for the Identification of Problematic Sites
Here, we aimed at evaluating the suitability of the acquisition and terrain characteristics for estimating the absolute vertical error (TDX90 minus the LiDAR DSM). Visual assessment of the data acquisition characteristics showed that the problematic sites tended to be located in the pixels with a relatively low number of acquisitions (COV) in combination with height inconsistencies among individual acquisitions (COM) and with high values of the theoretical height error (HEM; Figure 5). The BRT models of the absolute vertical error (|dh|) explained between 28% and 50% of the deviance. The auxiliary characteristics height error map (HEM) and consistency among the scenes (COM) together explained most of the explained variability (between 77% and 93%). The DLR-provided estimate of the magnitude of errors, the HEM, proved to be highly suitable for the identification of the problematic sites as this parameter in itself explained more than 60% of the explained variability ( Table 5). The effect of COM differed between the study areas, showing a large effect in the Pyrenees, a moderate effect in the Alps, and no effect in the Tatra Mountains. The effect of the remaining variables was minimal (see Table 5). Figure 5. Auxiliary TanDEM-X data of the three problematic sites of low TDX90 quality: the Alps (left), the Pyrenees (middle), and the Tatra Mountains (right). The most problematic sites are delimited with a dashed line. Visual assessment of the auxiliary data suggests that they can indeed be used to identify sites of low TDX90 quality. Note that the problems are mostly located in the areas of low coverage (COV), larger inconsistencies (COM = 9), and high estimated height error (HEM). Table 5. Performance of BRT models of the absolute vertical error (|dh|) of TDX90 and the relative importance of predictors.  [8] suggested that the HEM can be regarded as a reliable estimate of the vertical error, but, more recently, Gdulová et al. (2020) [10] showed that the HEM underestimates the vertical error. To further analyze the relationship between the HEM and the vertical error, we generated partial dependency plots of this variable ( Figure 6). The cells with HEM values lower than approximately 0.5 m showed excellent accuracy, but above that value, the difference between TDX90 and the LiDAR DSMs started to increase considerably. The plots showed a gradual increase of the absolute value of the height error up to the HEM value of 2-3 m, with minimal changes or even a slight decrease as the HEM continued to increase (note that the decrease cannot be properly evaluated due to relatively few measurements of relatively high HEM values). The modeled relationship between the absolute vertical error and the HEM corresponds with the measured error (see boxplots in Figure 6).

Discussion
In this study, we assessed the vertical accuracy of five global DEMs derived from TanDEM-X and/or SRTM in three major European mountain ranges. COP90 shows considerable improvement over TDX90 and SRTM90. Overall LE90s of TDX90 and COP90 were 17.98 m and 12.69 m, 6.81 m and 6.77 m, and 11.98 m and 11.51 m in the Alps, in the Pyrenees, and in the Tatras (Carpathians), respectively, when compared to the LiDAR DSMs. COP30 showed a similar accuracy to its coarser resolution counterpart COP90 and performed better than NASA30.
The accuracies reported in our study areas are generally lower than those reported in other studies but they reflect the challenging mountain environment [8,9,15]. Despite the postprocessing efforts by the German Aerospace Center that included inspection of the differences between the TanDEM-X DEM and the SRTM-C-band DEM [33], our evaluation and visual inspection revealed that in the mountain environment, TDX90 contains many cells with a relatively large error (i.e., artefacts such as spikes and holes), which is not a surprise as TDX90 is a nonedited product derived from interferometric radar processing and mosaicking only. On the other hand, it should be highlighted that the area needing improvement is relatively small. For example, less than 7% of all the cells in the Pyrenees study area were edited or filled with external DEMs (see Table 6), which is remarkable as the mountain environment is challenging for SAR acquisitions and complies with prior studies that highlighted the excellent accuracy of the TanDEM-X DEM [8][9][10]. In general, the largest errors are observed on steep slopes, sharp ridges, and in deep valleys. The Copernicus DEMs, on the other hand, are edited; spikes and holes were identified and voided out, small voids were filled by interpolation, larger voids were infilled using alternative DEMs [32]. Indeed, both performed better than TDX90, SRTM90, and NASA30 in all the study areas, which indicates that the editing process was successful. This corresponds with prior reports. Guth and Geoffroy (2021) [51] reported that the accuracy of COP90 was superior to other global DEMs over eight study sites located in the USA and Europe. Besides, the Copernicus DEM documentation reports that the absolute vertical accuracy in terms of LE90 is below 2 m and approximately 50% of Europe and more than 60% of the rest of the world, respectively, has an accuracy better than 2 m [32].
The gaps and problematic sites in COP90 were, however, filled with external reference data. The extent of this infilling with external data can reach up to tens of percent in mountainous areas, such as our study areas. This information is stored in the filling mask (FLM; part of the Copernicus DEM's auxiliary information), which indicates the source of filled pixels (i.e., the respective global DEM used for filling in the particular area). The prevailing models used for filling of the Copernicus DEMs in our study areas were the SRTM and ASTER DEMs, which were, however, acquired at different time periods from the TanDEM-X data (see Figure 3 and Table 6 for a summary of the number of cells that were replaced by data from other models). Consequently, it is necessary to bear this fact in mind as it also means that the suitability of the Copernicus DEMs for multitemporal analysis in the mountainous environment is limited and users should always consult auxiliary data such as the filling mask before any analysis requiring data from a specific time period [30,32] and remove such areas from analysis. Besides, when using TDX90, for example in combination with the upcoming TanDEM-X Change DEM [52], users should use auxiliary data, such as the HEM, COM, and COV, to identify possible problematic areas [10].
It is fair to note here that temporal misalignment between datasets used in this study might have also affected our results. Due to the limited data availability, it is a common practice to evaluate accuracy of global DEMs using the reference data from different timepoints [9,53,54]. Our study is not an exception, and the time of acquisition should be considered when interpreting the results. The LiDAR data of the Alps were collected in 2008 (i.e., before the TanDEM-X mission), the acquisition time over the Pyrenees more or less overlapped with the TanDEM-X mission, and the data for the Tatra Mountains were acquired a few years after the mission. Obviously, all the reference data were acquired several years after the SRTM mission (February 2000; [7]). Consequently, additional bias between the SRTM, TanDEM-X, and reference DSMs that might have resulted from vegetation growth or deforestation is to be expected.
Our results indicate that data acquisition characteristics that are available as the TDX90 auxiliary data (HEM, COM, COV) can be used for the identification of problematic sites with large errors. These results are consistent with prior studies that evaluated the role of acquisition characteristics on the TDX90 accuracy [8,10,29,55]. We identified the cells falling within the categories "Only one coverage" (COM4) and "Larger inconsistency" (COM1) as highly problematic, with a high risk of erroneous measurement and presence of artefacts. The cells in these categories should be subject to a thorough accuracy evaluation before being included in any analysis (less than 1% of cells in our study areas). Our results show that with the increasing number of acquisitions, the accuracy generally increases as the error can be averaged and smaller deviations are obtained in the resulting height estimate [34]. The cells within the "All heights are consistent" category (COM8) provide excellent accuracy. The cells within the "Smaller inconsistency but at least one consistent height pair" category (COM10) also show a relatively good accuracy, depending on the study area. The combination of large inconsistencies (COM9) and a relatively low number of acquisitions (COV < 6) results in poor accuracy. The DLR-provided estimate of the magnitude of errors, the HEM, proved to be highly suitable for the identification of the problematic sites. A quick inspection of a differential DEM or hill-shaded terrain only in the areas of the highest HEM values can facilitate quick (albeit rough) identification of problematic sites.

Conclusions
We have shown that the Copernicus DEM at the 30 m and 90 m resolution is clearly a more accurate representation of Earth's surface than the TanDEM-X, SRTM, and NASA DEMs. The RMSE for the TanDEM-X DEM and the Copernicus DEM improved from 45 m to 12 m, from 16 m to 6 m, and from 24 m to 9 m for the Alps, the Pyrenees, and the Carpathians, respectively. In other words, editing used for the generation of the Copernicus DEM from the TanDEM-X DEM was successful. However, the Copernicus DEM includes data from different time periods and, hence, their value for multitemporal analyses is limited. We suggest that when filling problematic sites using alternative DEMs, more attention should be paid to the period of their collection to minimize the temporal displacement in the final products. Finally, we have shown that data acquisition characteristics, particularly the height error map and the consistency mask, are useful for the localization of problematic sites in the TanDEM-X DEM that require further editing.