Digital Elevation Model Quality Assessment Methods: A Critical Review

: Digital elevation models (DEMs) are widely used in geoscience. The quality of a DEM is a primary requirement for many applications and is affected during the different processing steps, from the collection of elevations to the interpolation implemented for resampling, and it is locally influenced by the landcover and the terrain slope. The quality must meet the user’s requirements, which only make sense if the nominal terrain and the relevant resolution have been explicitly specified. The aim of this article is to review the main quality assessment methods, which may be separated into two approaches, namely, with or without reference data, called external and internal quality assessment, respectively. The errors and artifacts are described. The methods to detect and quantify them are reviewed and discussed. Different product levels are considered, i.e., from point cloud to grid surface model and to derived topographic features, as well as the case of global DEMs. Finally, the issue of DEM quality is considered from the producer and user perspectives.


Introduction
Terrestrial relief is essential information to represent on a map.For centuries, it has been displayed symbolically to indicate the presence of landforms like mountains or valleys (Figure 1a).By definition, a landform is "any physical, recognizable form or feature on the Earth's surface having a characteristic shape and produced by natural causes" [1] and is the result of endogenous processes such as tectonic motion and exogenous ones such as climate [2].During the 19th and the first half of 20th century, when topographic methods became more operational and geodetically more rigorous in terms of cartographic projection and physical altitude definition, a new relief representation appeared on topographic maps with elevation value indications and contour lines (Figure 1b).
To produce these maps, elevation values must be collected point by point in field surveys, and contour lines are then generated by interpolation.While this type of primitive survey has evolved towards more efficient techniques, it has long been supplemented by photogrammetry.Although terrestrial photogrammetry was first proposed as early as the 1860s by Captain Laussedat [3], this technique began to be seriously implemented at the end of the 19th century, initially in mountainous areas where it brought clear advantages, like in the French Alps by Henri and Joseph Vallot, and in western Canada by Gaston Deville.Although the stereoplotter industry began in the early 20th century, photogrammetry became fully operational after World War I with the development of aerial photography, and it became the standard method for elevation mapping worldwide after World War II.Several major innovations took place during the last quarter of the 20th century: black and white photography gave way to color, analytical photogrammetry became the standard for three-dimensional (3D) mapping, digital databases were implemented by mapping agencies, and a growing variety of platforms, sensors, and processing methods extended the possibility of photogrammetry, which became an increasingly powerful toolbox [4]: New platforms extended the capabilities of the aircraft: satellites covering larger areas (first TIROS observation satellite in 1960, across-track stereoscopy from different orbits with SPOT-1 in 1986, along-track stereoscopy with SPOT-5 in 2002), UAVs (fixed-wing for large area surveys, multirotor for small area surveys requiring agility and precision), and mobile mapping systems on land vehicles.

•
New processing methods were developed, such as aerotriangulation and automatic image matching that made it possible to process large quantities of images for the industrial production of orthophoto mosaics and altimetric databases with few ground control points [5], and the synergy between digital photogrammetry and computer vision led to very powerful 3D reconstruction methods such as SfM (structure from motion).

•
New sensors broadened the imaging possibilities, like spaceborne and airborne SAR (synthetic aperture radar) [6][7][8] and lidar [9,10], i.e., active sensors that made it possible to acquire data day and night, even in cloudy areas and beneath forests.
These innovations contributed to the rapid development of geographic information technologies by providing a vast amount of data, among which the digital elevation model (DEM) is one of the most widely utilized.Both public and private initiatives have led to an increasing availability of digital elevation models, either produced on demand or stored in multi-user databases.This has stimulated the development of a wide range of applications in which a quantitative description of the terrain geometry is needed, such as geoscientific studies (flood or landslide hazard assessment) [11,12] and those based on spatial analysis (soil properties, location of a telecommunication tower or of a new company) [13,14], etc., as well as image orthorectification [15,16].Some of those applications require shape and topology description based on quantitative terrain descriptors extracted from DEMs, such as slope maps, drainage networks, or watershed delineation, leading to the emergence of a new scientific discipline called "geomorphometry" [12,[17][18][19].Several major innovations took place during the last quarter of the 20th century: black and white photography gave way to color, analytical photogrammetry became the standard for three-dimensional (3D) mapping, digital databases were implemented by mapping agencies, and a growing variety of platforms, sensors, and processing methods extended the possibility of photogrammetry, which became an increasingly powerful toolbox [4]: New platforms extended the capabilities of the aircraft: satellites covering larger areas (first TIROS observation satellite in 1960, across-track stereoscopy from different orbits with SPOT-1 in 1986, along-track stereoscopy with SPOT-5 in 2002), UAVs (fixed-wing for large area surveys, multirotor for small area surveys requiring agility and precision), and mobile mapping systems on land vehicles.

•
New processing methods were developed, such as aerotriangulation and automatic image matching that made it possible to process large quantities of images for the industrial production of orthophoto mosaics and altimetric databases with few ground control points [5], and the synergy between digital photogrammetry and computer vision led to very powerful 3D reconstruction methods such as SfM (structure from motion).

•
New sensors broadened the imaging possibilities, like spaceborne and airborne SAR (synthetic aperture radar) [6][7][8] and lidar [9,10], i.e., active sensors that made it possible to acquire data day and night, even in cloudy areas and beneath forests.
These innovations contributed to the rapid development of geographic information technologies by providing a vast amount of data, among which the digital elevation model (DEM) is one of the most widely utilized.Both public and private initiatives have led to an increasing availability of digital elevation models, either produced on demand or stored in multi-user databases.This has stimulated the development of a wide range of applications in which a quantitative description of the terrain geometry is needed, such as geoscientific studies (flood or landslide hazard assessment) [11,12] and those based on spatial analysis (soil properties, location of a telecommunication tower or of a new company) [13,14], etc., as well as image orthorectification [15,16].Some of those applications require shape and topology description based on quantitative terrain descriptors extracted from DEMs, such as slope maps, drainage networks, or watershed delineation, leading to the emergence of a new scientific discipline called "geomorphometry" [12,[17][18][19].
As with any other product, the ability of a DEM to fulfil user requirements is characterized by a number of quality criteria.Once the general characteristics of the DEM have been agreed (resolution, spatial coverage, date etc.), the main requirements relate to the quality of the data themselves, which can be divided into two main types, namely, elevation quality (generally defined in terms of absolute or relative accuracy) and shape and topologic quality (related to the accuracy of DEM derivatives such as slope, aspect, curvature etc. that are quantitative descriptors of the relief [11,[20][21][22][23]).The quality of these descriptors is decisive for the applications that use them [24].If we express these two types of quality criteria in natural language, they mean that the model must be as close as possible to the real terrain position, and that it must look as much as possible like the real terrain, respectively.These two types of requirements are not equivalent, and it seems relevant to consider both of them.Indeed, a small position error can induce erroneous shape representations [25][26][27].On the other hand, a high autocorrelation of the elevation error tends to preserve terrain shape quality despite high positional errors [28].
The terrestrial relief surface has specific characteristics, different from those of most surfaces represented in geographic information, which make DEM quality assessment a challenging task: DEM is multi-user information, so that multiple quality criteria must be met, and although a particular user may specify application-driven requirements, the case of multi-user databases is obviously more complex.

•
The Earth's surface is a material object that speaks to our senses, so that one can complain about unrealistic relief modelling even in an unknown region; this leads to strong requirements including aesthetic ones.
The aim of this article is to discuss the issue of DEM quality assessment and to propose a review of the main approaches.Defining the nominal terrain and specifying the characteristics of the digital product are two prerequisites of DEM quality assessment that are addressed in Sections 2 and 3, respectively.Section 4 is the core of the article, with a review of relevant quality criteria for DEMs.This review encompasses a variety of quality assessment methods, based or not on reference data.Section 5 shows that these criteria can be considered at different levels, i.e., from point cloud to grid surface model, from grid surface model to topographic features, and at the global level.Section 6 shows that the resolution of a DEM should also be considered in its quality assessment.Finally, an overall discussion is proposed in Section 7.

Prerequisite: Definition of the Nominal Terrain
Assessing the quality of a DEM requires a clear and explicit definition of the nominal surface, i.e., the physical surface which is supposed to be modelled.Two nominal surfaces are often considered, namely the ground surface (represented in a DTM-digital terrain model) and the upper surface above the trees, buildings and all other natural or manmade objects (represented in a DSM-digital surface model, provided by most DEM production techniques such as photogrammetry and short wavelength radar technologies).It should be noted that DEM is a generic term that applies to both cases.
The choice between DTM and DSM depends on the foreseen application.For instance, a DSM is well suited for orthorectification of imagery as it requires information on the elevation of the top of buildings and trees, i.e., objects visible in the images, whereas a DTM is required for hydrological modelling that needs information on the level of the ground where surface water runs off.
As illustrated in Figure 2, the DSM is defined over any area, including in the presence of buildings (roofs) or trees (canopy).On the other hand, the DTM is defined in bare terrains as well as in forested areas (the ground surface exists and it controls the surface runoff although the presence of trees modifies the process), but not in the presence of a building, since in this case the ground surface does not physically exist (it can be interpolated to generate a continuous surface but this is meaningless for many applications, like hydrological modelling).
The nuance between DSM and DTM does not exist on other planets, nor on the seafloor, while it is relevant for the modelling of Earth continental surfaces due the abundance of trees and buildings.In many landscapes, the reality is even more complex, for example in the presence of grass vegetation whose height would be comparable to the precision of the topographic method: validating a DEM obtained by photogrammetry or even lidar, where the measured elevation lies at the top of the grass, by comparison with control points measured on the ground, can lead to an erroneous conclusion on the accuracy of the DEM and therefore of the mapping method, since the reference data do not represent the same surface as the DEM.Consequently, it would be meaningless to evaluate the quality of a DEM without a clear definition of the physical surface it is supposed to represent.The nuance between DSM and DTM does not exist on other planets, nor on the seafloor, while it is relevant for the modelling of Earth continental surfaces due the abundance of trees and buildings.In many landscapes, the reality is even more complex, for example in the presence of grass vegetation whose height would be comparable to the precision of the topographic method: validating a DEM obtained by photogrammetry or even lidar, where the measured elevation lies at the top of the grass, by comparison with control points measured on the ground, can lead to an erroneous conclusion on the accuracy of the DEM and therefore of the mapping method, since the reference data do not represent the same surface as the DEM.Consequently, it would be meaningless to evaluate the quality of a DEM without a clear definition of the physical surface it is supposed to represent.
Given the variety of relief mapping techniques, which are sensitive to different aspects of the terrain surface, it is highly recommended that the most appropriate technique be chosen according to the selected nominal terrain.In the presence of forests, photogrammetry and short wavelength radar techniques are more appropriate for DSM than for DTM, while lidar and long wavelength radar techniques are suitable for both surface models [29,30].However, it is not always possible to access the ideal technique.For example, it is very common to use photogrammetry to produce a DTM (i.e., the nominal terrain is the ground surface).In order to prevent users from making gross errors by using such information in forested areas, postprocessing strategies must be employed to remove tree height so that the DEM can theoretically be considered as a DTM.
In the case of sand or ice deserts, two possible nominal surfaces can be considered, namely, the bedrock and the upper sand or ice surface which is in contact with the atmosphere.It is relevant to define which of these surfaces is supposed to be represented in a DEM since the expectations will be different.Indeed, the bedrock is stable over time and its shape is most often generated by tectonics and long-term hydric erosion, while the shape of the upper sand or ice surface is constantly changing due to climatic processes and gravity.No operational technique, except field survey, is currently available to map the substratum in such areas for the Earth, but the future Biomass mission could bring new opportunities thanks to the penetrating capabilities of its P-band radar [31].
The case of shallow water bathymetry (i.e., rivers or coastal sea water) raises the question of the nominal terrain as well.Indeed, the surface and the bottom of the water have different physical behaviours with regards to existing imaging sensors and the image of this double surface must be interpreted with caution.For example, photogrammetry and green lidar can survey the bottom topography in the case of clear water [32,33] but the signal is attenuated by the presence of the water, whereas radar imagery only surveys the surface but the image is influenced by the bottom topography.At low tide, only the ground appears and it can be reconstructed in 3D [34,35].
The nominal terrain considered in a DEM, which is supposed to be defined in agreement with the user's requirements, should be clearly indicated for users' information.Nevertheless, important Given the variety of relief mapping techniques, which are sensitive to different aspects of the terrain surface, it is highly recommended that the most appropriate technique be chosen according to the selected nominal terrain.In the presence of forests, photogrammetry and short wavelength radar techniques are more appropriate for DSM than for DTM, while lidar and long wavelength radar techniques are suitable for both surface models [29,30].However, it is not always possible to access the ideal technique.For example, it is very common to use photogrammetry to produce a DTM (i.e., the nominal terrain is the ground surface).In order to prevent users from making gross errors by using such information in forested areas, postprocessing strategies must be employed to remove tree height so that the DEM can theoretically be considered as a DTM.
In the case of sand or ice deserts, two possible nominal surfaces can be considered, namely, the bedrock and the upper sand or ice surface which is in contact with the atmosphere.It is relevant to define which of these surfaces is supposed to be represented in a DEM since the expectations will be different.Indeed, the bedrock is stable over time and its shape is most often generated by tectonics and long-term hydric erosion, while the shape of the upper sand or ice surface is constantly changing due to climatic processes and gravity.No operational technique, except field survey, is currently available to map the substratum in such areas for the Earth, but the future Biomass mission could bring new opportunities thanks to the penetrating capabilities of its P-band radar [31].
The case of shallow water bathymetry (i.e., rivers or coastal sea water) raises the question of the nominal terrain as well.Indeed, the surface and the bottom of the water have different physical behaviours with regards to existing imaging sensors and the image of this double surface must be interpreted with caution.For example, photogrammetry and green lidar can survey the bottom topography in the case of clear water [32,33] but the signal is attenuated by the presence of the water, whereas radar imagery only surveys the surface but the image is influenced by the bottom topography.At low tide, only the ground appears and it can be reconstructed in 3D [34,35].
The nominal terrain considered in a DEM, which is supposed to be defined in agreement with the user's requirements, should be clearly indicated for users' information.Nevertheless, important altimetric products such as SRTM and TanDEM-X global DEMs do not clearly specify the nominal terrain.Although X-and C-band SAR interferometry rather provides a DSM and no filtering is applied to the native products to transform them into DTMs, all information about these products suggests that they are specified as ground surface models, and many users regard them as such [36][37][38].Considering that the nominal surface is the ground implies that elevation discrepancies due to tree height must be interpreted in terms of DEM error.In other words, they contribute to the fact that the DEM does not fulfil user requirements for geoscientific studies.
The need for a clear specification of the nominal terrain in the metadata has become even more critical with the growing development of high resolution and high precision products, which have stimulated applications based on the modelling of small objects, such as buildings or parts of buildings, in synergy with topographic methods like terrestrial or mobile laser scanning of urban scenes.In the case of buildings that are very complex objects, the need for simplification has led to the definition of level of detail (LOD) standards that can help to specify user requirements [39,40].Moreover, undesirable objects such as vehicles or street furniture may appear in the input data and the need to include them depends on the DEM application.Similarly, when a bridge spans a road or a river, the product specification must clearly indicate whether the bridge is modeled (which could lead to large errors in hydrological modeling) or not (which could generate artifacts in the orthophotos created using the DEM).

DEM as a Cartographic Product
The altimetric information collected to represent the terrain surface is generally represented in the form a cartographic product, which is defined by a number of specifications.Traditional paper maps are specified with a scale, a cartographic projection and a legend that indicates how the map describes the terrain.Similarly, the DEM is characterized by metadata, i.e., data about the DEM itself, which make possible the access to the altimetric data in the digital file and provide information such as the producer, the date, the accuracy etc.In this section, we show that the Earth's surface has very special properties (Section 3.1) which influence the way a DEM must be specified, i.e., the structure of the planimetric grid (Section 3.2) and the numerical representation of the elevation (Section 3.3).

A Very Peculiar Surface
Unlike most physical surfaces represented by 3D models for robotics, architecture, industry etc., the Earth's relief has a very particular property: it can be considered as a 2.5D surface.Indeed, once the nominal terrain has been defined (e.g., the ground surface), a unique altitude z can be assigned to each horizontal position (x, y).This is due to the gigantic mass of the Earth (M ~6.10 24 kg) and the intensity of the associated gravitational field (g ~9.8 m•s −2 ), where the attraction of all material particles towards the mass center gives the Earth's spherical shape.As a result, the Earth's surface has almost no hidden part when seen from above.It is comparable to a bas-relief which has no hidden part when seen from the front.This property is not cancelled out by the centrifugal force produced by the Earth's rotation, which transforms the sphere into an ellipsoid.
A few exceptions exist, like overhang cliffs where more than one elevation value can be assigned to a given horizontal position.But they are rare enough to justify the general assumption that any vertical axis (from the Earth's center to the zenith) has a unique intersection with the topographic surface (Figure 3).This is the definition of a 2.5D surface and it has a very interesting advantage since the elevation can be considered as a bivariate function z = f(x, y).Indeed, most cartographic systems consist in separating two planimetric coordinates (latitude and longitude or equivalent) on the one hand, and a vertical coordinate (altitude above sea level) on the other hand.Therefore, the specification of a DEM as a cartographic product consists in specifying two aspects: the planimetric grid structure and the numerical representation of elevation values, beyond the classical specifications of all maps like datum and map projection.These specification data are essential.They are generally available as metadata with the DEM file.Our aim is not to recall this well-known information in detail, since it has been widely addressed in GIS literature [41], but to stress the need for an explicit specification as a condition for DEM quality assessment.

Planimetric Grid Sructure
Most relief mapping techniques consist of two steps, namely, altitude computation for a large number of terrain points, and resampling to fit the specifications of the output product when necessary.The technique implemented to compute terrain altitudes with specific acquisition and processing parameters has an impact on the positional accuracy of each point and on the density of the resulting point cloud.However, the DEM quality is not limited to these aspects since the subsequent resampling step could also have a major impact on the quality of the output DEM.Two important specifications are set up to define the resampling procedure, i.e., the grid structure and the interpolation algorithm.

Grid Structure
The grid structure defines the planimetric positions in which the altimetric values are to be stored.Two main grid structures may be considered, namely, regular (e.g., a square mesh raster grid) and irregular structures (e.g., a triangular irregular network).
The regular grid has clear advantages in terms of data handling.Indeed, the horizontal coordinates are implicit, and they theoretically do not need to be stored provided that they are logically defined by metadata, and the square mesh grid can be displayed directly on a screen or a printout with no need for further resampling.The regular grid structure is defined a priori (i.e., regardless of the distribution of the input point cloud), but most data acquisition methods provide irregularly distributed points so that an interpolation step is required to resample the data in order to fit the regular grid.Conversely, no resampling is required for the irregular structure, which is built a posteriori over previously acquired points.It is typically the case of the well-known TIN (triangular irregular network): this structure consists of a network of non-overlapping planar triangles built on the preliminary point cloud [42,43].Intermediate grid structures have been proposed, like for progressive or composite sampling [44,45], in which a regular grid is iteratively densified as long as the curvature exceeds a given threshold, leading to a semi-regular grid of variable density [46][47][48].This multiresolution modeling approach has gained interest for interactive terrain visualization [49].
The size and shape of the grid mesh must be specified with care due to their impact on the DEM quality.
• The mesh size (and therefore the grid density) has an impact on the DEM absolute accuracy, but also on the ability of the DEM to describe landforms as shown below in Section 4.2.This ability is intuitive, since a smaller mesh is expected to allow a better rendering of small topographic objects, but a signal processing approach may also provide criteria to optimize resampling and to detect DEM errors [50].The link between mesh size and resolution, and the way DEM quality assessment should consider these concepts, are addressed in Section 6. • The mesh shape has an influence on the adequacy of the surface modelling to local landforms.
Between the two most common grid structures, namely constant size squares and variable size triangles, the latter is clearly better suited to describe a surface morphology in which drainages, ridges and other slope breaks have variable orientations.However, there are several solutions to

Planimetric Grid Sructure
Most relief mapping techniques consist of two steps, namely, altitude computation for a large number of terrain points, and resampling to fit the specifications of the output product when necessary.The technique implemented to compute terrain altitudes with specific acquisition and processing parameters has an impact on the positional accuracy of each point and on the density of the resulting point cloud.However, the DEM quality is not limited to these aspects since the subsequent resampling step could also have a major impact on the quality of the output DEM.Two important specifications are set up to define the resampling procedure, i.e., the grid structure and the interpolation algorithm.

Grid Structure
The grid structure defines the planimetric positions in which the altimetric values are to be stored.Two main grid structures may be considered, namely, regular (e.g., a square mesh raster grid) and irregular structures (e.g., a triangular irregular network).
The regular grid has clear advantages in terms of data handling.Indeed, the horizontal coordinates are implicit, and they theoretically do not need to be stored provided that they are logically defined by metadata, and the square mesh grid can be displayed directly on a screen or a printout with no need for further resampling.The regular grid structure is defined a priori (i.e., regardless of the distribution of the input point cloud), but most data acquisition methods provide irregularly distributed points so that an interpolation step is required to resample the data in order to fit the regular grid.Conversely, no resampling is required for the irregular structure, which is built a posteriori over previously acquired points.It is typically the case of the well-known TIN (triangular irregular network): this structure consists of a network of non-overlapping planar triangles built on the preliminary point cloud [42,43].Intermediate grid structures have been proposed, like for progressive or composite sampling [44,45], in which a regular grid is iteratively densified as long as the curvature exceeds a given threshold, leading to a semi-regular grid of variable density [46][47][48].This multiresolution modeling approach has gained interest for interactive terrain visualization [49].
The size and shape of the grid mesh must be specified with care due to their impact on the DEM quality.

•
The mesh size (and therefore the grid density) has an impact on the DEM absolute accuracy, but also on the ability of the DEM to describe landforms as shown below in Section 4.2.This ability is intuitive, since a smaller mesh is expected to allow a better rendering of small topographic objects, but a signal processing approach may also provide criteria to optimize resampling and to detect DEM errors [50].The link between mesh size and resolution, and the way DEM quality assessment should consider these concepts, are addressed in Section 6.

•
The mesh shape has an influence on the adequacy of the surface modelling to local landforms.Between the two most common grid structures, namely constant size squares and variable size triangles, the latter is clearly better suited to describe a surface morphology in which drainages, ridges and other slope breaks have variable orientations.However, there are several solutions to build a network of triangles from a given point cloud.For example, the Delaunay triangulation is advantageous to avoid very elongated triangles (which are less suitable for spatial analysis), but it does not necessarily give the most accurate DEM nor the most respectful one of terrain shapes: it must therefore be constrained so that the edges of the triangles are placed along slope discontinuities such as drainages or other slope break lines.This is an advantage of triangle networks, which are able to adapt to the topography.

Interpolation Algorithm
Between the grid points, the surface model is built by interpolation so that the elevation and its derivatives can be determined at any planimetric position.The interpolation is based on the input points on the one hand, and on a generic mathematical surface model on the other hand.There are different interpolation algorithms and they can affect the quality of the resulting DEM [51].The interpolator has less influence on the quality of the DEM if the initial point cloud is dense or if the grid is irregular with selected points on the slope breaks.Indeed, in these cases the mesh of the surface model remains close to the ground and a smooth interpolator can be used.On the other hand, the interpolator has a great influence on the quality of the DEM if the grid points have a low density, or in the presence of holes (missing information) which can occur as a result of the presence of a cloud in photogrammetry or a forest in repeat-pass SAR interferometry.
A great variety of interpolators are used in geoscience for interpolating thematic data to produce raster maps [52], but they are not always relevant for DEMs, since the performances achieved for moisture or geochemical concentrations, for instance, have no reason to be effective for elevation.Some algorithms generate altimetric variability, like kriging or fractal zoom.These two different mathematical formalisms aim at calculating and analyzing the variability around the interpolated mesh and propagating it at a finer scale within the mesh [53].It is a way to guarantee geomorphologic realism as compared to usual interpolators [54].

Numerical Representation of Elevation Values
Elevation values are generally defined as altitudes, i.e., measured relative to a horizontal surface that models the Earth at sea level.It is a gravimetric definition of elevation, suitable for many applications based on hydrologic or hydraulic modelling.With the space age, a global reference was sought, such as a global geoid model, and the fact of locating points by satellite (e.g., GNSS) led to prefer a geometric definition of the reference surface, typically an ellipsoid such as WGS84 (World Geodetic System), easier to handle than a geoid model.The coexistence of geometric and gravimetric references requires conversions between systems, and it is a source of confusion [55][56][57].
Even if some software tools assign elevation values to surface elements (i.e., pixels), this information is only relevant when assigned to a Euclidean point, defined by its coordinates which are easy to store and used to compute distances, angles and other metrics.However, except high precision topometric techniques (e.g., total station and GNSS), all techniques compute a mean elevation over a local neighborhood.In a DEM generated by digital photogrammetry for example, the image matching process is based on the correlation between small windows of a given size, typically several pixels, and the resulting elevation, though assigned to a point position (x, y), is influenced by the elevation distribution over a terrain area [58].
The bit depth is also an important product specification for a DEM, since it has an impact on the data volume and on the vertical quantization.Early generation DEMs were generally coded on 8 bits because they were not very precise and computers had limited storage capacities.It was a limitation, mainly in plain areas where step-like artifacts could appear on the ground surface.Since the 1990s, the floating-point representation has been enabled by 16 or 32-bit encoding to represent decimal numbers, increasing the precision and the elevation range for DEM storage.A 16-bit depth is good enough for global DEMs such as SRTM, with a 5-10 m range of accuracy and a 1-m quantization unit [59][60][61].However, a 32-bit depth is preferable for high accuracy DEM as obtained by aerial or terrestrial photogrammetry or by laser scanning.
All these specifications are essential to guarantee the quality of a digital elevation model.

Main DEM Quality Assessment Approaches
The validation of a DEM consists in assessing its ability to fulfil user requirements based on adapted quality criteria.For each criterion, a quality assessment procedure can be implemented to decide whether a DEM is acceptable for a given use [62].In this section, we review the main quality assessment approaches.Although this review is not exhaustive, it illustrates the variety of existing methods commonly used by producers and users.We show that DEM quality assessment can be considered with or without ground reference data.In the first case, the DEM is compared with a reference data set (external validation), whereas in the second case, inconsistencies are sought within the DEM itself with no reference data (internal validation).

Comparison with Ground Control Data
The goal of the external validation is to assess the quality of a DEM using external reference data, which can be a sparse cloud of points, contour lines, topographic profiles, or a much more accurate DEM.A comprehensive overview of external validation methods can be found in [63].As mentioned in Section 3, a DEM is a digital 2.5D representation of the Earth's surface, in which the elevation is a bivariate function z = f(x, y).Thus, the quality of a DEM includes planimetric and altimetric components.Most of the studies aim at evaluating the vertical quality in a DEM, i.e., the altimetric quality rather than the planimetric quality.The usual approach consists in computing some statistical indicators based on the altimetric discrepancies between the DEM and the reference.As a rule of thumb, the reference dataset must fulfil two main requirements:

•
It must be much more accurate than the evaluated DEM.

•
It must be dense enough and well distributed over the area to allow meaningful statistical analysis, and even a spatial analysis of the error.
The altimetric error of a DEM is composed of three main components, namely, gross, systematic and random errors [64,65] as shown in Figure 4.A systematic error is a bias between the modelled surface and the ground truth and it depends on the production technique, especially the data acquisition configuration, but also on the interpolation method [66].It can be calculated using the average value of the elevation difference between the DEM and the reference data.Random errors are mainly due to the production technique and they are mostly influenced by the quality of the raw data, the processing parameters, as well as the terrain morphology and the vegetation [67].They are evaluated through the calculation of the standard deviation of the elevation difference.Gross errors are outliers resulting from faults during the production of the DEM.

Main DEM Quality Assessment Approaches
The validation of a DEM consists in assessing its ability to fulfil user requirements based on adapted quality criteria.For each criterion, a quality assessment procedure can be implemented to decide whether a DEM is acceptable for a given use [62].In this section, we review the main quality assessment approaches.Although this review is not exhaustive, it illustrates the variety of existing methods commonly used by producers and users.We show that DEM quality assessment can be considered with or without ground reference data.In the first case, the DEM is compared with a reference data set (external validation), whereas in the second case, inconsistencies are sought within the DEM itself with no reference data (internal validation).

Comparison with Ground Control Data
The goal of the external validation is to assess the quality of a DEM using external reference data, which can be a sparse cloud of points, contour lines, topographic profiles, or a much more accurate DEM.A comprehensive overview of external validation methods can be found in [63].As mentioned in Section 3, a DEM is a digital 2.5D representation of the Earth's surface, in which the elevation is a bivariate function z = f(x, y).Thus, the quality of a DEM includes planimetric and altimetric components.Most of the studies aim at evaluating the vertical quality in a DEM, i.e., the altimetric quality rather than the planimetric quality.The usual approach consists in computing some statistical indicators based on the altimetric discrepancies between the DEM and the reference.As a rule of thumb, the reference dataset must fulfil two main requirements:

•
It must be much more accurate than the evaluated DEM.

•
It must be dense enough and well distributed over the area to allow meaningful statistical analysis, and even a spatial analysis of the error.
The altimetric error of a DEM is composed of three main components, namely, gross, systematic and random errors [64,65] as shown in Figure 4.A systematic error is a bias between the modelled surface and the ground truth and it depends on the production technique, especially the data acquisition configuration, but also on the interpolation method [66].It can be calculated using the average value of the elevation difference between the DEM and the reference data.Random errors are mainly due to the production technique and they are mostly influenced by the quality of the raw data, the processing parameters, as well as the terrain morphology and the vegetation [67].They are evaluated through the calculation of the standard deviation of the elevation difference.Gross errors are outliers resulting from faults during the production of the DEM.As any other geographic data, the elevation in a DEM has a spatial behaviour and so does its error.Indeed, although a part of the elevation error is random, it is often spatially autocorrelated, which means that neighboring points tend to have comparable errors [68,69].Thus, the random error is a combination of two types of random variables: one is spatially autocorrelated whereas the other As any other geographic data, the elevation in a DEM has a spatial behaviour and so does its error.Indeed, although a part of the elevation error is random, it is often spatially autocorrelated, which means that neighboring points tend to have comparable errors [68,69].Thus, the random error Remote Sens. 2020, 12, 3522 9 of 36 is a combination of two types of random variables: one is spatially autocorrelated whereas the other is a pure noise [70].This autocorrelation can also be anisotropic [71].Then, the elevation of a particular point in a DEM is the sum of its true elevation and the aforementioned errors: where ẑi is the altitude in the DEM, z i is the true altitude on the terrain, µ is the systematic error (bias), ε i is the spatially autocorrelated random error, and ε i is the spatially non-autocorrelated random error (pure noise).External quality assessment consists in analyzing the whole set of elevation discrepancies (µ + ε i + ε i ), calculated between the DEM and a set of ground control points (GCPs), based on simple statistical indicators such as the mean, the standard deviation, the root mean square error (RMSE), the maximum error etc. [69].These indicators are used to assess the DEM error in terms of altimetric accuracy and precision [67].Note that the terms related to data uncertainty are often used in a confusing way [72].The third edition of the Photogrammetric Terminology [73] defines accuracy as "the closeness of the result of a measurement, calculation or process to the true, intended or standard value", and precision as "the repeatability of a result, including the number of significant figures quoted, regardless of its correctness" (further clarification on the meaning of these words and on the pitfalls of translation can be found in [74]).
The most commonly used statistical indicator is the RMSE [65], which provides a reliable indication on the altimetric DEM error considering that the sample size N of the reference data (i.e., the number of GCPs) is big enough [43]: The relationship between the RMSE, the mean, and the standard deviation of the error is the following: The statistical distribution of the error is often Gaussian [28], although some studies suggest that this hypothesis is not always valid [75].To verify the validity of this hypothesis, Q-Q plots (quantile-quantile plots [63,75]) can be used.This can be helpful to reveal a non-normal distribution and to suggest more adapted statistical indicators for error analysis, like the median, which is more robust, i.e., not affected by extreme error values.If we consider that the error follows a normal distribution N(µ,σ), then it is possible to define the confidence level of the error that corresponds to the percentage of points whose error value lies in the confidence interval.For instance, the intervals [µ − σ; µ + σ] and [µ − 1.96σ; µ + 1.96σ] correspond to linear errors with 68% (LE68) and 95% (LE95) confidence levels, respectively.
The elevation error propagates through elevation derivatives like slope, aspect and curvature, leading to erroneous drainage network or watershed delineation for instance [76,77].Although the computation of the slope or other derivatives may lead to different results according to the selected algorithms [78][79][80][81][82], the error propagation always generates an uncertainty in the resulting product (e.g., slope map) that needs to be evaluated due to its wide use in many applications of DEMs in geoscience [68,[82][83][84].In general, the elevation derivatives are very sensitive to the spatial autocorrelation of the error [28].Indeed, since the elevation derivatives are calculated using a neighborhood, their quality depends on the error of the points from which they have been extracted as well as on the autocorrelation of their errors.Then, it is possible to have a low positional quality but a high quality of the derivatives as in the case of a big systematic error and a small random error, and vice versa.
Most DEMs are delivered with a simple value of the RMSE or a standard deviation of the elevation error, and this value is considered as representative of the overall quality of the DEM, where the spatial distribution of the error is completely ignored [71].This is due to the fact that the evaluation of the spatial autocorrelation of the error requires a dense sample of control points, which in many cases is not available.However, although the RMSE is well adapted for the positional quality assessment of a DEM, it may not be informative about the quality of elevation derivatives since it does not consider the spatial behaviour of the error [27].This behaviour can reveal systematic trends due to inaccurate orbit or sensor model parameters as well as local error autocorrelation, often due to the influence of landscape characteristics, e.g., higher error in forested or mountainous areas.Thus, to assess the error propagation from the elevation to its derivatives, an error model has to be built (Figure 5).
or mountainous areas.Thus, to assess the error propagation from the elevation to its derivatives, an error model has to be built (Figure 5).
This error model includes the elevation error as well as its autocorrelation and it can be modelled using a semivariogram.The following formula represents an exponential error model [85]: where γ is the semivariance, h is the horizontal distance, σ is the sill and φ is the range.An error model having a null nugget (i.e., y-intercept), a minimal sill and a maximum practical range provides a high spatial autocorrelation and then a high quality of elevation derivatives.The generation of the model is based on the second-order stationary hypothesis, which supposes that the mean and the standard deviation of the error are constant in the DEM and that the autocorrelation of the error is isotropic [69].The isotropic hypothesis, which is not necessarily verified in the natural relief due to tectonic or hydrographic phenomena, may also be questioned by the DEM production process, i.e., the data acquisition geometry (as in the case of side looking radar), the processing method (image matching along lines) or the grid structure (raster effect) [86].The error propagation can be measured either analytically when possible or numerically through simulation-based (e.g., Monte Carlo) methods when it is not analytically possible [68].The goal is to evaluate the impact of data uncertainty on the subsequent extracted derivatives [77,85,87,88].For instance, the analytically measured error propagation in slope (as it is the most important morphological index [89]) calculated by trigonometry is the following [90]: where S is the slope, d is the horizontal distance separating points 1 and 2, σ is the error of the slope and r is the autocorrelation coefficient of the error.Moreover, in the case of finite differences, which give a more accurate slope estimation [91], the slope error for a third-order finite This error model includes the elevation error as well as its autocorrelation and it can be modelled using a semivariogram.The following formula represents an exponential error model [85]: where γ is the semivariance, h is the horizontal distance, σ 2 z is the sill and ϕ is the range.An error model having a null nugget (i.e., y-intercept), a minimal sill and a maximum practical range provides a high spatial autocorrelation and then a high quality of elevation derivatives.The generation of the model is based on the second-order stationary hypothesis, which supposes that the mean and the standard deviation of the error are constant in the DEM and that the autocorrelation of the error is isotropic [69].The isotropic hypothesis, which is not necessarily verified in the natural relief due to tectonic or hydrographic phenomena, may also be questioned by the DEM production process, i.e., the data acquisition geometry (as in the case of side looking radar), the processing method (image matching along lines) or the grid structure (raster effect) [86].
The error propagation can be measured either analytically when possible or numerically through simulation-based (e.g., Monte Carlo) methods when it is not analytically possible [68].The goal is to evaluate the impact of data uncertainty on the subsequent extracted derivatives [77,85,87,88].For instance, the analytically measured error propagation in slope (as it is the most important morphological index [89]) calculated by trigonometry is the following [90]: where S is the slope, d is the horizontal distance separating points 1 and 2, σ S is the error of the slope and r z 1 z 2 is the autocorrelation coefficient of the error.Moreover, in the case of finite differences, which give a more accurate slope estimation [91], the slope error for a third-order finite difference weighted by reciprocal of squared distance method [79,[92][93][94] is obtained by the following formula [85]: where w is the mesh size and C is the spatial autocovariance.
In a numerical approach, a multiple realization of DEMs is produced based on the error model, where the precise realization is considered to be one of them.For each realization, the targeted derivative is extracted, which provides a sample for this derivative.Then, a probabilistic method is used to evaluate the quality of this derivative [25,85].
All the aforementioned descriptors aim at evaluating the absolute vertical quality and the derivative quality.In the horizontal dimension, a planimetric shift in the input point cloud has an impact on DEM elevations that behaves as a planimetric error, which can also be evaluated in both absolute and relative terms.The absolute planimetric quality is obtained by comparing the planimetric positions to reference data and adapted statistical descriptors are used to quantify the error like for the altimetric error as shown above.The relative positional quality is measured between points identified in the DEM.For instance, the Euclidian distance between two points is obtained using the coordinates of these points; if we consider these two points have the same error, then the measured distance is not affected by this error, as in the case of a constant planimetric shift of the DEM.

Simulation-Based DEM Production Method Validation
An interesting extension of external validation consists in processing simulated images to derive a DEM and to compare the output DEM with the input DEM used to simulate the synthetic image dataset.In this case, the goal is not to evaluate a particular DEM, but a topographic method used for DEM production.
This approach implies that computational tools must be available to simulate realistic images based on a landscape model and a sensor model.Such tools are developed to study radiative transfer processes and to test remote sensing strategies [95,96].The landscape model includes its geometry (DEM and objects over the ground: trees, buildings etc.) and all the characteristics that are relevant to compute the electromagnetic signal received by the sensor over this particular landscape.The sensor model includes the description of the sensor itself (spectral interval and any technical property needed to define the geometry of the images and to compute pixel radiometric values) as well as the acquisition geometry through the platform position and orientation typically.
Figure 6 shows the workflow that can be implemented for simulation-based validation.Comparing the input and output DEMs allows the estimation of the error produced by the topographic restitution method as well as its spatial behaviour.It can also help to better understand the effects of error propagation.This approach has long been implemented to evaluate the expected performances of image acquisition strategies as well as processing algorithms for different mapping methods based on remote sensing data, i.e., photogrammetry [97], lidar [98], and radar methods [99,100].More recently it has also been applied to terrestrial mobile mapping [101].Although this approach may suffer from a lack of realism that could create biases in the analysis, it has several advantages such as the possibility to vary the experimental conditions: • The topographic restitution method can be tested over a variety of landscapes, leading to more comprehensive conclusions than over a particular DEM.• It can also be tested with a variety of sensor parameters and orbit configurations, leading to recommendations for optimum image acquisition conditions if the images are to be processed for DEM production.
Regarding the lack of realism, Figure 6 suggests some refinements to overcome this limitation, for instance: • The input DEM can be reprocessed, either to exaggerate the elevation amplitude, or to introduce topographic details such as microrelief or buildings (note that all these changes in the input DEM can be parametrically controlled, allowing analytical sensitivity studies).Fractal resampling can be implemented to produce a more realistic input DEM [102], and geomorphology provides criteria to verify this realism requirement in accordance with the principles of internal validation which we will see further [103].• Sensor parameter uncertainties can be introduced to consider the fact that the DEM production method always uses an approximation of the exact parameters.
Since a dense reference dataset is available, the principles of external quality assessment can be fully applied.However, DEM series based on simulated images can also be evaluated with an internal quality assessment approach, i.e., with no reference data.

Internal Quality Assessment
Internal quality assessment is implemented with no ground control.It makes use of criteria of realism based on a priori knowledge of the general behaviour of all topographic surfaces (intuition or geomorphological science).Although this approach may suffer from a lack of realism that could create biases in the analysis, it has several advantages such as the possibility to vary the experimental conditions:

•
The topographic restitution method can be tested over a variety of landscapes, leading to more comprehensive conclusions than over a particular DEM.

•
It can also be tested with a variety of sensor parameters and orbit configurations, leading to recommendations for optimum image acquisition conditions if the images are to be processed for DEM production.
Regarding the lack of realism, Figure 6 suggests some refinements to overcome this limitation, for instance:

•
The input DEM can be reprocessed, either to exaggerate the elevation amplitude, or to introduce topographic details such as microrelief or buildings (note that all these changes in the input DEM can be parametrically controlled, allowing analytical sensitivity studies).Fractal resampling can be implemented to produce a more realistic input DEM [102], and geomorphology provides criteria to verify this realism requirement in accordance with the principles of internal validation which we will see further [103].

•
Sensor parameter uncertainties can be introduced to consider the fact that the DEM production method always uses an approximation of the exact parameters.
Since a dense reference dataset is available, the principles of external quality assessment can be fully applied.However, DEM series based on simulated images can also be evaluated with an internal quality assessment approach, i.e., with no reference data.

Internal Quality Assessment
Internal quality assessment is implemented with no ground control.It makes use of criteria of realism based on a priori knowledge of the general behaviour of all topographic surfaces (intuition or geomorphological science).

Visual Control
As recalled above, land surfaces are very familiar to each of us, so that we can easily complain about a lack of realism in their representation, even in unknown regions.This makes visual control a very powerful approach, although often neglected, for DEM quality assessment [104].The expectations with regards to relief modelling have long inspired poets and painters, though with a noticeable progress in the past centuries.For instance, the mountainous background landscape behind the Mona Lisa portrait by Leonardo da Vinci (Figure 7), though inspired by real Italian landscapes [105], is clearly unrealistic.At Leonardo's time the landscape was a mere backdrop to fill the background of portraits or scenes, until the Dutch painters of the 17th century began to find interest in the landscape itself, leading to more realistic shapes (whether observed or imagined).This natural expectation with regards to relief shape modelling explains why it is so natural to rely on visual control for DEM quality assessment.

Visual Control
As recalled above, land surfaces are very familiar to each of us, so that we can easily complain about a lack of realism in their representation, even in unknown regions.This makes visual control a very powerful approach, although often neglected, for DEM quality assessment [104].The expectations with regards to relief modelling have long inspired poets and painters, though with a noticeable progress in the past centuries.For instance, the mountainous background landscape behind the Mona Lisa portrait by Leonardo da Vinci (Figure 7), though inspired by real Italian landscapes [105], is clearly unrealistic.At Leonardo's time the landscape was a mere backdrop to fill the background of portraits or scenes, until the Dutch painters of the 17th century began to find interest in the landscape itself, leading to more realistic shapes (whether observed or imagined).This natural expectation with regards to relief shape modelling explains why it is so natural to rely on visual control for DEM quality assessment.Two usual methods are based on grey levels to represent DEMs, namely, hypsometry and hillshade (or shadowing).Hypsometry assigns a grey level to an elevation interval so that a continuous grey scale is available to represent the vertical amplitude from the lowest point (black) to the highest one (white), whereas hillshade consists in illuminating the ground surface by a virtual light source, preferably located in the Northern side (in the general case where the map is oriented with North upwards) so that the landforms are illuminated from the top to the bottom of the screen, which is the usual way to see the real land surface.
Since most DEM artifacts have significantly more impact on slope than on elevation, we can expect that hillshade (in which grey level variations result from slope variations) is more efficient than hypsometry for shape quality assessment.DEM viewing strategies based on hillshade are more efficient for detecting gross errors [106] and resampling artifacts [107].This is confirmed by Figure 8: the river network at the North of the area has almost completely disappeared in the DEM with Gaussian noise; step-like artifacts appear in the 8-bit coded DEM; small drainages disappear in the DEM subsampled 4 times.These effects are very clear in hillshade but not in hypsometry.
A challenge of DEM visualization is to represent a 2.5D surface on a 2D screen.In the examples of hypsometry and hillshade the grey scale is devoted to representing either the elevation or the slope.However, an integrated relief visualization product can be generated by using a color scale for hypsometry and the grey scale for hillshade.Such a product is more complete than mere shadowing thanks to its color-induced elevation perception, and it can even be transformed into a 3D perspective view to offer an increased perception of elevations.However, the hillshade effect remains the major contribution to visual quality control since it is more sensitive to potential artifacts or unrealistic shapes than the hypsometric effect.Two usual methods are based on grey levels to represent DEMs, namely, hypsometry and hillshade (or shadowing).Hypsometry assigns a grey level to an elevation interval so that a continuous grey scale is available to represent the vertical amplitude from the lowest point (black) to the highest one (white), whereas hillshade consists in illuminating the ground surface by a virtual light source, preferably located in the Northern side (in the general case where the map is oriented with North upwards) so that the landforms are illuminated from the top to the bottom of the screen, which is the usual way to see the real land surface.
Since most DEM artifacts have significantly more impact on slope than on elevation, we can expect that hillshade (in which grey level variations result from slope variations) is more efficient than hypsometry for shape quality assessment.DEM viewing strategies based on hillshade are more efficient for detecting gross errors [106] and resampling artifacts [107].This is confirmed by Figure 8: the river network at the North of the area has almost completely disappeared in the DEM with Gaussian noise; step-like artifacts appear in the 8-bit coded DEM; small drainages disappear in the DEM subsampled 4 times.These effects are very clear in hillshade but not in hypsometry.
A challenge of DEM visualization is to represent a 2.5D surface on a 2D screen.In the examples of hypsometry and hillshade the grey scale is devoted to representing either the elevation or the slope.However, an integrated relief visualization product can be generated by using a color scale for hypsometry and the grey scale for hillshade.Such a product is more complete than mere shadowing thanks to its color-induced elevation perception, and it can even be transformed into a 3D perspective view to offer an increased perception of elevations.However, the hillshade effect remains the major contribution to visual quality control since it is more sensitive to potential artifacts or unrealistic shapes than the hypsometric effect.A visual approach of DEM quality assessment can also be applied to vector information derived from the DEM.This is usually done for vector data quality control in GIS, consisting in verifying topological properties such as connectivity [41].In the case of DEM quality assessment, this approach can typically be applied to the drainage network, which may have geometrical and topological errors, as illustrated in Figure 9.However, it is not straightforward to decide if the artifacts observed in the extracted drainage network are caused by the DEM or by the drainage extraction algorithm, although it could be feasible by comparing different hydrographic networks extracted from the same DEM with different algorithms.A visual approach of DEM quality assessment can also be applied to vector information derived from the DEM.This is usually done for vector data quality control in GIS, consisting in verifying topological properties such as connectivity [41].In the case of DEM quality assessment, this approach can typically be applied to the drainage network, which may have geometrical and topological errors, as illustrated in Figure 9.However, it is not straightforward to decide if the artifacts observed in the extracted drainage network are caused by the DEM or by the drainage extraction algorithm, although it could be feasible by comparing different hydrographic networks extracted from the same DEM with different algorithms.
The visual analysis of a raster DEM or a vector product derived from a DEM, which can be helpful for preliminary control, is an inductive approach based on our repeated experience of landform viewing in everyday life.It is an intuitive implementation of internal DEM quality assessment, which can also be implemented with quantitative criteria, although with no ground control.from the DEM.This is usually done for vector data quality control in GIS, consisting in verifying topological properties such as connectivity [41].In the case of DEM quality assessment, this approach can typically be applied to the drainage network, which may have geometrical and topological errors, as illustrated in Figure 9.However, it is not straightforward to decide if the artifacts observed in the extracted drainage network are caused by the DEM or by the drainage extraction algorithm, although it could be feasible by comparing different hydrographic networks extracted from the same DEM with different algorithms.

Quantitative Internal Quality Assessment
Internal quality assessment is based on the hypothesis that all topographic surfaces are supposed to fulfil some universal rules.This approach is relevant to evaluate the quality of a DEM in terms of shape rendering, for which external quality assessment is less straightforward.Indeed, the comparison of a terrain shape as modelled in the evaluated DEM with regards to a reference dataset is limited by the difficulty of defining a suitable metric, and by the fact that height derivatives (slope, aspect, curvature) and therefore shapes are not stable with regards to scale, while elevation is stable [109][110][111], as discussed below in Section 6.Therefore, the comparison of two DEMs, which may not necessarily have the same native mesh size, may bring some information on elevation accuracy, but probably not on slope accuracy or shape realism.For this reason, it is more convenient to evaluate the geomorphological realism of a DEM through a rule-based approach, i.e., by analyzing its compliance to a number of general rules that the terrestrial relief is supposed to fulfil.
Two different requirement levels can be considered, which may be called strong and weak requirements [108].A strong shape modelling requirement is defined by physical rules and a terrain that does not fulfil them is impossible.A typical example is the downward streaming of the water.A practical consequence is that the detection of local sinks along the hydrographic network provides a criterion to locate and quantify DEM errors [108,112], except in the case of natural sinkholes (dolines) which only appear in specific geological contexts.Such a quality assessment method is easy to implement, and it does not need any reference data.Once the hydrographic network has been automatically extracted from the DEM, the sinks are detected, and their occurrence is analyzed.Sink density and mean depth are quantitative criteria that are easy to calculate and useful to evaluate the compliance of this physical rule.Their spatial distribution is also an indicator of error zones.
A weak shape modelling requirement is more difficult to define.It is based on statistical rules and a terrain that does not fulfil them is improbable, although not impossible.It is a quantitative implementation of artifact detection, with criteria to identify unrealistic shapes in a DEM [113].In both cases (i.e., visual or quantitative) the lack of realism is assessed based on geomorphological skills.The most usual realism criteria are inspired by the principle of the first law of geography, which states that "everything is related to everything else, but near things are more related than distant things" [114,115].Some rules are based on the fractal behaviour of the topographic surface for terrains modelled by hydric erosion, which is the most general case on the Earth's surface [116,117].
A typical rule is provided by Horton's law, also based on the fractal hypothesis, which states that if the river streams are classified by Strahler orders [118], the total number of streams of a given order decreases in a geometric progression when the order increases [119].In other terms, the logarithm of the total number of streams of order n decreases linearly as a function of n. Figure 10 illustrates an interesting contribution of Horton's law in a study of the quality of the Topodata DEM obtained by oversampling SRTM in Brazil [120].As expected according to Horton's law, a linear tendency with R 2 = 0.9836 is observed over 12 orders (Figure 10a), which means that the relief described by the DEM is nearly fractal.However, it appears that excluding the values obtained for orders 1 and 2, which are slightly higher than the linear trend, and plotting the results for orders 3 to 12 only, increases the linearity with R 2 = 0.9965 (Figure 10b).This suggests that the Topodata resampling process slightly increases the number of short streams with regards to a fractal hypothesis, which could be confirmed in the evaluation of stream extraction among several DEMs (Topodata and SRTM included) by [121].Similarly, if the area is divided in watersheds, the cumulative area of the n largest watersheds increases linearly with n.All these rules aim at verifying that the DEM will provide reliable indicators for applications based on landforms, which cannot be guaranteed by the absolute positional accuracy as calculated in external DEM quality assessment.
Since internal quality assessment consists in confronting the data with a rule (Horton law, rivers always streaming downward, etc.), it is worth noting that the discrepancies between the data and the ideal behaviour can provide quality indicators, such as the number and the mean depth of the sinks along the drainages (and this would even be a criterion to validate sink removal methods), or the R² coefficient in the control of the compliance with Horton's law.This is why the internal quality assessment method, although not using any reference data, can really be considered as a quantitative approach, with objective criteria to decide if a DEM is acceptable or not.
This statistical approach of internal quality assessment, which consists in looking for unrealistic shapes in a DEM, has similarities with the use of the famous Benford's law [122,123] to detect financial frauds based on the automated detection of unrealistic statistical distributions in the accounts [124].Indeed, this law uses a very widely verified property: in most series of numbers found in many fields of science, economy, sports etc. the occurrence of n (n = 1 to 9) as the first digit of each number decreases rapidly as a function of n, following approximately log (1 + 1/n), with 1 in 30% of the numbers, 2 in 18% and so on.Accounts that clearly depart from this rule are suspected to have been falsified.The authors have shown that Benford's law can reveal unrealistic morphologies in DEMs provided that an appropriate metric is used for the series of numbers.A study carried out on a series of DEMs shows that elevation is not relevant, while slope and Strahler order respect the expected behaviour and can therefore reveal artifacts [125], as illustrated in Figure 11.Similarly, if the area is divided in watersheds, the cumulative area of the n largest watersheds increases linearly with n.All these rules aim at verifying that the DEM will provide reliable indicators for applications based on landforms, which cannot be guaranteed by the absolute positional accuracy as calculated in external DEM quality assessment.
Since internal quality assessment consists in confronting the data with a rule (Horton law, rivers always streaming downward, etc.), it is worth noting that the discrepancies between the data and the ideal behaviour can provide quality indicators, such as the number and the mean depth of the sinks along the drainages (and this would even be a criterion to validate sink removal methods), or the R 2 coefficient in the control of the compliance with Horton's law.This is why the internal quality assessment method, although not using any reference data, can really be considered as a quantitative approach, with objective criteria to decide if a DEM is acceptable or not.
This statistical approach of internal quality assessment, which consists in looking for unrealistic shapes in a DEM, has similarities with the use of the famous Benford's law [122,123] to detect financial frauds based on the automated detection of unrealistic statistical distributions in the accounts [124].Indeed, this law uses a very widely verified property: in most series of numbers found in many fields of science, economy, sports etc. the occurrence of n (n = 1 to 9) as the first digit of each number decreases rapidly as a function of n, following approximately log (1 + 1/n), with 1 in 30% of the numbers, 2 in 18% and so on.Accounts that clearly depart from this rule are suspected to have been falsified.The authors have shown that Benford's law can reveal unrealistic morphologies in DEMs provided that an appropriate metric is used for the series of numbers.A study carried out on a series of DEMs shows that elevation is not relevant, while slope and Strahler order respect the expected behaviour and can therefore reveal artifacts [125], as illustrated in Figure 11.
Internal quality assessment can even be more powerful when the DEM production method is known including the parameters used for image acquisition, elevation calculation and resampling.Indeed, it is then possible to know what to expect from the quality of the product and this can guide the search for artifacts and make it easier to distinguish between existing landforms and processing artifacts.This approach is particularly interesting to reveal resampling effects as illustrated in the following two examples.
of each number decreases rapidly as a function of n, following approximately log (1 + 1/n), with 1 in 30% of the numbers, 2 in 18% and so on.Accounts that clearly depart from this rule are suspected to have been falsified.The authors have shown that Benford's law can reveal unrealistic morphologies in DEMs provided that an appropriate metric is used for the series of numbers.A study carried out on a series of DEMs shows that elevation is not relevant, while slope and Strahler order respect the expected behaviour and can therefore reveal artifacts [125], as illustrated in Figure 11.Internal quality assessment can even be more powerful when the DEM production method is known including the parameters used for image acquisition, elevation calculation and resampling.Indeed, it is then possible to know what to expect from the quality of the product and this can guide the search for artifacts and make it easier to distinguish between existing landforms and processing In the first example [126], a DEM obtained by interpolation of contour lines with a mesh of 40 m was evaluated using a fractal description of the terrestrial relief.Modelling the semivariogram led to a much lower fractal dimension for distances of 40 to 200 m (D = 2.07) than for distances of 400 to 1200 m (D = 2.25), and the horizontal distance between the contour lines was most often between 200 and 400 m, which confirms that the interpolator had a smoothing effect.In addition, the fractal dimension over short distances revealed an anisotropy of the interpolation by showing that the DEM was even smoother in the directions NS and EW (directions of the interpolator) than in the diagonal directions.This is a fractal approach which is not so different from the use of information entropy as a measure of DEM quality [127].
The second example is part of a study previously cited [108].The directional histograms of the aspect (direction of the maximum slope) were compared between two DEMs, namely Topodata and SRTM.The result in Figure 12 shows the same overall elliptic shape in both histograms, related with the general trend in the regional topography, but the SRTM aspect histogram exhibits higher occurrences in a few specific directions, which correspond exactly to the principal directions of the raster grid, while Topodata seems to have redistributed the aspects in a more realistic way.artifacts.This approach is particularly interesting to reveal resampling effects as illustrated in the following two examples.
In the first example [126], a DEM obtained by interpolation of contour lines with a mesh of 40 m was evaluated using a fractal description of the terrestrial relief.Modelling the semivariogram led to a much lower fractal dimension for distances of 40 to 200 m (D = 2.07) than for distances of 400 to 1200 m (D = 2.25), and the horizontal distance between the contour lines was most often between 200 and 400 m, which confirms that the interpolator had a smoothing effect.In addition, the fractal dimension over short distances revealed an anisotropy of the interpolation by showing that the DEM was even smoother in the directions NS and EW (directions of the interpolator) than in the diagonal directions.This is a fractal approach which is not so different from the use of information entropy as a measure of DEM quality [127].
The second example is part of a study previously cited [108].The directional histograms of the aspect (direction of the maximum slope) were compared between two DEMs, namely Topodata and SRTM.The result in Figure 12 shows the same overall elliptic shape in both histograms, related with the general trend in the regional topography, but the SRTM aspect histogram exhibits higher occurrences in a few specific directions, which correspond exactly to the principal directions of the raster grid, while Topodata seems to have redistributed the aspects in a more realistic way.Hence when the topographic indices highlight well known characteristics of the sampling method, it confirms that it is probably not the terrain that naturally has these characteristics but an artifact in the DEM.

DEM Validation at Different Levels
The different methods reviewed in the previous section for the detection and evaluation of DEM errors can be considered at several levels.Indeed, several steps follow one another from preliminary terrain observation to application-driven product generation.The error propagates from a level to the next one and quality requirements can be expressed at each level.In this section, we consider quality requirement transfer from point cloud to grid surface model (Section 5.1) and Hence when the topographic indices highlight well known characteristics of the sampling method, it confirms that it is probably not the terrain that naturally has these characteristics but an artifact in the DEM.

DEM Validation at Different Levels
The different methods reviewed in the previous section for the detection and evaluation of DEM errors can be considered at several levels.Indeed, several steps follow one another from preliminary terrain observation to application-driven product generation.The error propagates from a level to the next one and quality requirements can be expressed at each level.In this section, we consider quality requirement transfer from point cloud to grid surface model (Section 5.1) and from grid surface model to derived topographic features (Section 5.2).Moreover, the quality assessment of a global DEM is not straightforward when based of local data analysis (Section 5.3).

From Point Cloud to Grid Surface Model
The first result of elevation computation is a point cloud in which the individual quality of each point is influenced by the selected technique and by the parameters chosen for data acquisition (sensor, orbit . . .[128][129][130][131][132][133][134][135]) and processing (e.g., matching template size, radar slave image interpolation method . . .[84,[136][137][138][139][140]), as well as by the local slope and the landcover [84,[140][141][142][143][144][145].There is an almost linear relationship between the DEM elevation error and the terrain slope [146][147][148][149], although the impact on the landform quality is more severe in low-slope areas [85].This point cloud can be obtained directly in the case of laser scanning methods, or through image processing techniques in the case of photogrammetry or SAR interferometry for instance.The output DEM is a surface model obtained from the point cloud with a specific grid structure (e.g., raster or TIN), using an interpolation process if resampling is necessary.The grid surface model inherits the errors of the input point cloud, but the resampling process contributes too.For example, comparing different base to height ratios (or different sizes of the matching template) in respective DEMs is not fully relevant if further resampling has modified the initial DEMs.
In fact, the relative contribution of the input data and the interpolation depends on the criterion considered to evaluate the DEM:

•
In terms of absolute vertical accuracy, the input data are essential: the errors of the input point cloud are generally autocorrelated (due to orbit, relief . . . ) so that they remain in the resampled DEM whatever the selected interpolation method.

•
In terms of shape realism, the interpolation plays a major role since it can remove or create artifacts (noise, stripe, pixelation, etc.).The resampling step may filter the noise of the input point cloud and therefore improve the quality in terms of elevation accuracy.In contrast, the interpolation implemented for resampling may produce an exaggerated smoothing effect or a raster grid effect, resulting in a quality degradation in terms of shape rendering.Indeed, the choice of an interpolation algorithm affects not only the elevation absolute accuracy but also the geomorphometric indices [150], and this effect depends on the local terrain morphology [151][152][153][154].The effect of an interpolator on the DEM quality depends on both the DEM production technique and the application-oriented user requirements [43,155,156].Hengl and Evans [70] distinguish three aspects for classifying interpolation methods, namely, the smoothing effect (exact or approximate interpolation), the proximity effect (local or global interpolation) and the stochastic hypothesis.For example, an exact interpolation method, such as linear interpolation, is recommended if the data are very dense and accurate, while a smoothing method should be used if the data are noisy.Mitas and Mitasova [157] state that the description of the smoothing and tension effects and the consideration of ridges and streams are the most important evaluation criteria for an interpolation method.
Another study by [152] also tested the relevance, at different scales and according to the terrain morphology, of five interpolators, namely, inverse distance weighting (IDW), ordinary (OK) and universal (UK) kriging, multiquadratic radial basis function (MRBF), and regularized spline with tension (RST).This study concludes that the impact of the different interpolation methods depends on the density of the original sample, i.e., the interpolation methods show small differences with a high sample density, whereas this difference becomes more important when the sample density is low.This is quite an intuitive result, since with a low density, the elevation calculation depends more on the interpolator than on the data.On a low density sample and on different scales, the IDW provides more accurate results in mountainous regions, whereas the OK performs best in regions with naturally smooth relief, which is consistent with [153].Finally, the authors conclude that three criteria must be taken into consideration when selecting the interpolation method: type of relief, sample density, and applicability to different spatial scales.

From Grid Surface Model to Derived Topographic Features
Many applications consist in analyzing topographic objects extracted from the DEM [158][159][160][161][162]. It is not the purpose of this article to deal with the exploitation of these objects, which is specific to each application, but it is relevant to consider the defects of these extracted objects as indicators of the defects of the DEM.This is particularly the case of the hydrographic network, which has universal properties that the DEM must respect.The errors of the extracted drainage network depend on the one hand on the errors of the DEM, and on the other hand on the extraction algorithm [163][164][165].These two influences cannot be considered separately because a robust algorithm can compensate for the errors of the DEM by "forcing" the runoff despite a noisy profile.Figure 13 shows the main types of errors in the drainage network extracted from a DEM.The DEM errors have an impact on the network geometry (a: absolute location error; b: unrealistic shape) or on its topology (c: wrong topology although possible; d: impossible topology).It should be noted that the error detection requires reference data for (a) and (c), while no reference is needed for (d); (b) is an intermediate situation in which unrealistic shapes can be detected in the DEM itself but a reference network or other external information such as a remote sensing image can help to confirm the interpretation.Therefore, both external and internal quality assessment methods, as defined in Section 4, provide a complete set of criteria to evaluate the quality of a topographic object derived from the DEM, such as the hydrographic network.This is particularly the case of the hydrographic network, which has universal properties that the DEM must respect.The errors of the extracted drainage network depend on the one hand on the errors of the DEM, and on the other hand on the extraction algorithm [163][164][165].These two influences cannot be considered separately because a robust algorithm can compensate for the errors of the DEM by "forcing" the runoff despite a noisy profile.Figure 13 shows the main types of errors in the drainage network extracted from a DEM.The DEM errors have an impact on the network geometry (a: absolute location error; b: unrealistic shape) or on its topology (c: wrong topology although possible; d: impossible topology).It should be noted that the error detection requires reference data for (a) and (c), while no reference is needed for (d); (b) is an intermediate situation in which unrealistic shapes can be detected in the DEM itself but a reference network or other external information such as a remote sensing image can help to confirm the interpretation.Therefore, both external and internal quality assessment methods, as defined in Section 4, provide a complete set of criteria to evaluate the quality of a topographic object derived from the DEM, such as the hydrographic network.Another example of DEM-derived topographic object is the variable layer of sand, snow, or ice over the ground surface.Many studies consist in measuring and analyzing the thickness of these layers based on the difference between two DEMs produced at different dates [166,167].The quality of the result depends on the cumulative quality of both DEMs.Two main influences may be mentioned to illustrate the need for a careful DEM quality specification prior to the calculation of a thickness: • The accuracy of the thickness, which can be estimated through the RMSE of the DEM difference, is: which in the case of two DEMs having the same quality can be written: RMSE = √2 RMSE (10) hence a very strong requirement in terms of DEM accuracy.
• whatever the accuracy of the two DEMs, any horizontal registration error Δx between them will Another example of DEM-derived topographic object is the variable layer of sand, snow, or ice over the ground surface.Many studies consist in measuring and analyzing the thickness of these layers based on the difference between two DEMs produced at different dates [166,167].The quality of the result depends on the cumulative quality of both DEMs.Two main influences may be mentioned to illustrate the need for a careful DEM quality specification prior to the calculation of a thickness:

•
The accuracy of the thickness, which can be estimated through the RMSE of the DEM difference, is: which in the case of two DEMs having the same quality can be written: hence a very strong requirement in terms of DEM accuracy.

•
whatever the accuracy of the two DEMs, any horizontal registration error ∆x between them will lead to a vertical error ∆z on the thickness, which is a function of the slope θ: The relevant requirement to avoid this effect is a high relative positional accuracy of the two DEMs.Systematic vertical or horizontal errors can be corrected by 3D matching prior to thickness computation.However, such a correction has to be implemented very carefully since it may absorb the thickness of the layer which is supposed to be measured.
These examples confirm the need to specify input DEM quality with suitable criteria before computing derived topographic objects.

Challenge of Global DEM Quality Assessment
The concept of global DEM has gained increasing interest since DEMs can be produced from satellite data which cover the whole planet.The goal of a global DEM can be either to provide global mapping to a single user (e.g., military program), or to provide users around the world with easy-to-access altimetric information on each area of interest.In this second case it is actually a multi-user database and it is not straightforward to reconcile multiple user requirements.Those global DEMs are extensively used in geoscientific studies [168][169][170][171].
Juxtaposing national DEMs would lead to a mosaic product with heterogeneous quality, not to mention restrictions in some countries [172].The advantage of a global DEM is to offer a homogeneous quality, even if it is an illusory ideal because no technique can provide a homogeneous quality with such a variety of climate, relief and landcover.
The launch of satellites with stereo-viewing capabilities (SPOT-1 in 1986, ERS-1 in 1991, Radarsat in 1994, etc.) and the development of image processing algorithms for photogrammetry and SAR interferometry enabled since the 1990s an industrial activity for the production of DEM on demand, and things got closer to the concept of global DEM with SPOT-5 HRS (2002) designed to increase the accuracy and the production capacity.In parallel, global DEM projects have been envisaged with dedicated missions (SRTM) or with systematic use of data from an existing mission (GDEM from ASTER, WorldDEM from TanDEM-X).Planning a global DEM had to face several challenges.Beyond the technical constraints linked to the enormous quantity of data, it was necessary to develop automatic methods for production as well as for quality control.
Studies published on the quality of a global DEM, based either on theory or on the study of a particular test site where reference data are available, often have a limited scope and do not allow easily to predict the quality on another site, or to a certain extent only [173][174][175][176][177]. The production of the first DEM after the launch of a satellite [178] should be considered as a demonstration of feasibility, likely to indicate an order of magnitude, rather than as a prediction of the quality which can be expected anywhere in the world.Two important reasons can be cited to explain the unpredictability of a global quality.The first one is that the global DEM can be a mosaic of DEM tiles of heterogeneous theoretical quality, due to the fact that the acquisition parameters vary from one tile to another, like the base to height ratio in the case of optical stereo pairs, and even unpredictable, like the baseline in the case of repeat-pass SAR interferometry.The second reason is related to the influence of relief and landcover:

•
The accuracy of elevation and slope depends on slope and landcover [90,140,146].

•
The impact of image acquisition parameters on the DEM accuracy (which must be studied to optimize the values of these parameters) also depends on slope and landcover [140,179].
Landcover can be considered to locally improve the DEM.For instance, the tree height can be removed in forested areas to model the ground surface, provided that the forest patches have been identified and located.This removal can be achieved assuming a constant tree height or through empirical methods based on tree height estimation at forest edges [180] or on the use of ancillary data such as ICESat [181].
The validation of a global DEM must rely on many test cases with different orbital configurations and different landscapes, or on simulations (see Section 4.1.2),to meet multiple requirements:

•
Allow a user to predict the quality of the DEM on a given site for a given application.

•
Guide product improvements by post-processing or by merging data from different sources.

•
Consider the variety of geographic conditions as we have just seen, to guarantee that the conclusions on the global DEM quality do not depend on particular conditions, or to limit the scope of the conclusions to a particular landscape.

•
Consider different quality criteria as suggested in Section 4 (including the artifacts produced by resampling, which have been shown to also have a significant impact on the geomorphometric quality of the final DEM), so that a variety of user needs is considered.Indeed, a global DEM is typically a multi-user database, for which it is difficult to define quality standards that are suitable for all potential users.

From Scale to Resolution
In the case of traditional analog maps, usual cartographic conventions used to define map scales (e.g., 1:20,000 topographic maps or 1:50,000 geological maps) are based on the fact that we do not expect the same accuracy and the same shape rendering in maps with different scales.The same happens with digital elevation models, except that the classical concept of scale (defined as the ratio between a distance on the map and the corresponding distance on the ground) is meaningless for digital products.Indeed, a distance cannot be defined on a digital map (stored on a hard disk or similar device) as it can be on a computer screen or on a paper map.Therefore, the classical concept of scale has to be replaced by an equivalent indicator, typically the grid mesh size in a raster DEM.The compatibility between a mesh size and a scale has been widely addressed since the very beginning of digital maps, based on the size of the smallest detail that can be reasonably represented on a map, typically 1 × 1 mm [182,183].In this context, the NIMA (National Imagery and Mapping Agency, today NGA) defined a standard for its own Digital Terrain Elevation Data (DTED) [184], with a range of three levels with increasing resolution: DTED0 with 30" sampling for gross representation of the Earth's surface, DTED1 with 3" sampling approximately equivalent to the contour information represented on a 1:250,000 scale map, and DTED2 (1", 1:50,000, resp.).More recently, the increasing availability of sensors and methods for DEM production with higher resolution has led to the definition of new standards such as NGA's HRE standard defined for higher resolution DEMs, from 12 m to 12 cm for both mesh size and vertical accuracy [185].
The term "resolution" is often confusing because it can apply to different aspects of digital cartography and is commonly used improperly or without clarity.The resolution is the ability of the DEM to discriminate objects, for instance in a geological landscape with periodic forms.It is linked to the smallest detectable wavelength and it is therefore an essential characteristic for the ability of a DEM to describe the shapes [186][187][188].The mesh size (e.g., meters per pixel) is different and can be irrelevant to the actual DEM resolution.Note that the term "pixel size" is also inappropriate for DEMs, in which the data refer to a point (i.e., intersection of a vertical axis with the terrain surface), whereas image data refer to a pixel (i.e., surface element over which a radiometric measurement is integrated), so that DEMs and images have different physical meanings although DEMs can be stored in a raster format like images.
A small mesh size is a necessary condition to guarantee a high resolution.Figure 14 shows that the availability of SRTM 1" (30 m) instead of SRTM 3" (90 m) resulted in an improved resolution.
(e.g., airborne lidar point clouds have a lower density under forests than on bare ground); moreover, a cloud (for photogrammetry) or a forest (for repeat-pass SAR interferometry) can prevent the calculation of elevations and create holes in the DEM so that the resolution is locally lost.
• Whatever the input point cloud, the interpolator imposes its form to some extent, mainly in low density areas.• Even if the initial point cloud is dense, this does not guarantee a high resolution because each point can be the result of a calculation requiring information over a neighborhood (e.g., case of a large template for image matching [84,140]), and this is not an effect of the interpolation.For these different reasons, the elevation of each point of the DEM is influenced by an area around this point.This confirms that DEM resolution is clearly different from mesh size, and user requirements should not be expressed in terms of mesh size if the relevant criterion is actually the resolution.
This resolution requirement can also be interpreted in terms of signal processing [50].Shannon's theorem gives a sampling rule and indicates the minimum rate at which a signal must be sampled, at the risk of irretrievably losing parts of it.Under certain hypotheses which are globally verified for the terrestrial relief, a signal can be decomposed into a sum of harmonics, in potentially infinite number, each harmonic (k) being characterized by an amplitude ak, a phase φk, and a frequency fk: The signal should therefore be sampled in such a way as to preserve the maximum frequency harmonic fmax, which requires a DEM density of at least two points per cycle in each dimension, and thus a sampling frequency of at least twice the highest frequency contained in the signal (the so-called Nyquist condition).However, the resolution also depends on the input data (point cloud before resampling) and on the interpolation, so that a small mesh size, although necessary, is not a sufficient condition to guarantee a high resolution.Three reasons can be given to explain why the smallest terrain shapes may be lost even with a small grid mesh:

•
The point cloud generally has a variable density, with low density areas due to physical reasons (e.g., airborne lidar point clouds have a lower density under forests than on bare ground); moreover, a cloud (for photogrammetry) or a forest (for repeat-pass SAR interferometry) can prevent the calculation of elevations and create holes in the DEM so that the resolution is locally lost.

•
Whatever the input point cloud, the interpolator imposes its form to some extent, mainly in low density areas.

•
Even if the initial point cloud is dense, this does not guarantee a high resolution because each point can be the result of a calculation requiring information over a neighborhood (e.g., case of a large template for image matching [84,140]), and this is not an effect of the interpolation.
For these different reasons, the elevation of each point of the DEM is influenced by an area around this point.This confirms that DEM resolution is clearly different from mesh size, and user requirements should not be expressed in terms of mesh size if the relevant criterion is actually the resolution.
This resolution requirement can also be interpreted in terms of signal processing [50].Shannon's theorem gives a sampling rule and indicates the minimum rate at which a signal must be sampled, at the risk of irretrievably losing parts of it.Under certain hypotheses which are globally verified for the terrestrial relief, a signal can be decomposed into a sum of harmonics, in potentially infinite number, each harmonic (k) being characterized by an amplitude a k , a phase ϕ k , and a frequency f k : The signal should therefore be sampled in such a way as to preserve the maximum frequency harmonic f max , which requires a DEM density of at least two points per cycle in each dimension, and thus a sampling frequency of at least twice the highest frequency contained in the signal (the so-called Nyquist condition).
f DEM ≥ 2 f max (13) The terrain frequency spectrum has a wide range of frequencies and it is theoretically infinite so that it is impossible to represent the complete terrain geometry in a finite size DEM, as it is equivalent to produce a map at scale 1:1, according to Umberto Eco's famous article "On The Impossibility of Drawing a Map of the Empire on a Scale of 1 to 1" [189].However, a maximum spatial frequency can be reasonably defined, based either on the resolution limitation of the imaging sensor used to derive the DEM, or on the user's requirement in terms of object size.This leads to the specification of a minimum grid density.In the case of irregular grid spacing, the sampling density can be reduced over smooth terrain (i.e., with neglectable curvature) as done for progressive and composite sampling with a subsampling criterion based on the Laplacian, i.e., a curvature index [48,190].
Given the importance of DEM resolution for the description of shapes, the mesh size that conditions the resolution must be considered in DEM quality assessment for two reasons given hereafter.

Stability of Topographic Indices with Regards to Resolution
Information extracted from a DEM (i.e., the elevation and its derivatives as well as topographic objects such as the drainage network) can be compared with the same information extracted from a reference DEM for quality assessment.This is the principle of external validation discussed above (Section 4.1).However, the DEM to be evaluated and the reference DEM are usually derived from different mapping processes.For example, the reference DEM may be of a finer scale (higher resolution), or it may be an official DEM with a reputation for reliability.In other words, one may have to compare indices in DEMs of different scales, i.e., different resolutions.It should be considered that some indices are unstable when changing scale, while others remain stable [109][110][111]191].Many studies about the impact of the mesh size on geomorphological and hydrographic indices lead to the expected result that increasing the mesh size has a smoothing effect on the DEM [192][193][194][195][196][197][198].This is of particular concern in DEM applications that use slopes for hydrological modelling or for mapping erosion, avalanche, or landslide hazards.Decisions are made for the safety of populations based on results obtained at a given scale, which may over-or underestimate the risk.Indeed, increasing the mesh size (i.e., degrading the resolution) smooths the terrain and statistically reduces the slope values [90], as shown in Figure 15: the elevation histogram is unchanged (i.e., the histograms corresponding to the different resolutions are strictly superimposed), while the slope histogram is shifted toward lower slopes.Therefore, studies using DEM to exploit slopes should always specify the scale (i.e., the mesh size), otherwise their results are meaningless.The terrain frequency spectrum has a wide range of frequencies and it is theoretically infinite so that it is impossible to represent the complete terrain geometry in a finite size DEM, as it is equivalent to produce a map at scale 1:1, according to Umberto Eco's famous article "On The Impossibility of Drawing a Map of the Empire on a Scale of 1 to 1" [189].However, a maximum spatial frequency can be reasonably defined, based either on the resolution limitation of the imaging sensor used to derive the DEM, or on the user's requirement in terms of object size.This leads to the specification of a minimum grid density.In the case of irregular grid spacing, the sampling density can be reduced over smooth terrain (i.e., with neglectable curvature) as done for progressive and composite sampling with a subsampling criterion based on the Laplacian, i.e., a curvature index [48,190].
Given the importance of DEM resolution for the description of shapes, the mesh size that conditions the resolution must be considered in DEM quality assessment for two reasons given hereafter.

Stability of Topographic Indices with Regards to Resolution
Information extracted from a DEM (i.e., the elevation and its derivatives as well as topographic objects such as the drainage network) can be compared with the same information extracted from a reference DEM for quality assessment.This is the principle of external validation discussed above (Section 4.1).However, the DEM to be evaluated and the reference DEM are usually derived from different mapping processes.For example, the reference DEM may be of a finer scale (higher resolution), or it may be an official DEM with a reputation for reliability.In other words, one may have to compare indices in DEMs of different scales, i.e., different resolutions.It should be considered that some indices are unstable when changing scale, while others remain stable [109][110][111]191].Many studies about the impact of the mesh size on geomorphological and hydrographic indices lead to the expected result that increasing the mesh size has a smoothing effect on the DEM [192][193][194][195][196][197][198].This is of particular concern in DEM applications that use slopes for hydrological modelling or for mapping erosion, avalanche, or landslide hazards.Decisions are made for the safety of populations based on results obtained at a given scale, which may over-or underestimate the risk.Indeed, increasing the mesh size (i.e., degrading the resolution) smooths the terrain and statistically reduces the slope values [90], as shown in Figure 15: the elevation histogram is unchanged (i.e., the histograms corresponding to the different resolutions are strictly superimposed), while the slope histogram is shifted toward lower slopes.Therefore, studies using DEM to exploit slopes should always specify the scale (i.e., the mesh size), otherwise their results are meaningless.A higher subsampling ratio leads to larger and less numerous cells (source: [90]).
By smoothing the surface model, subsampling tends to reduce the curvilinear length of vertical profiles and contour lines extracted from the DEM.Similarly, it reduces the length of drainage lines and watershed boundaries.This has an impact on the calculation of some hydrographic indices used to characterize watersheds.This is the case of the Gravelius index, used to characterize the compacity in watershed classifications [199]: where P is the perimeter and A is the area.The problem is that the area is stable with regard to scale but the perimeter is not [200], so that the Gravelius index is meaningless if the DEM scale (i.e., mesh size in practice) is not specified, whatever the DEM quality.Santos et al. [111] proposed a classification in which stable vs unstable hydrographic indicators are identified.
If we consider the quality criteria defined in Section 4, these examples suggest that mesh size transformation (like any form of resampling) has more impact on the landform modelling than on elevation values.

Relevant Resolution for Landform Modelling
Internal quality assessment, which consists in controlling landform realism by confronting the DEM to general rules the terrestrial relief is expected to fulfil everywhere, is generally more relevant at low resolution.This is quite an intuitive truth, since a DEM is then more likely to express pure geomorphologic properties, while high resolution reveals non-topographic objects like trees and man-made objects.
By comparing a DTM and DSM in the Amazon rainforest, Polidori and Simonetto [201] show that using a DSM instead of a DTM for slope calculation leads to an error of 5 • for a 30 m mesh (e.g., ASTER GDEM and SRTM 1") against only 2 • for a 90 m mesh (e.g., SRTM 3").Therefore, high resolution should not be preferred in forested or urban areas for applications based on slopes.This is a trap of the temptation of high resolution.Moreover, the DEM errors (except highly autocorrelated errors that increase the RMS error but generate a low impact on shapes) mainly affect the highest spatial frequencies of the DEM.For instance, the DEM artifacts illustrated in Figure 8 create or remove small shapes, typically from 1 to a few sampling units.Consequently, the highest frequencies are affected by both non-topographic objects and DEM artifacts.
The rules that express a fractal behaviour in regions where the relief is structured into watersheds (i.e., almost everywhere) are particularly well suited to highlight the spatial frequency above which landforms are altered by DEM artifacts.Thus, in the case of DEMs obtained by interpolating contour lines, the interpolator produces undesirable effects (smoothing, anisotropy) over periods shorter than the horizontal interval between the contour lines on the map.This may be revealed by comparing fractal dimension values under and over this limit [126].Similarly, Figure 10 shows how a DEM behaves with regard to Horton's law, i.e., pure geomorphology for Strahler orders of 3 and higher (large rivers) while orders of 1 and 2 (small rivers) are probably more affected by DEM artifacts [108].

Discussion
As we have seen, DEM quality assessment must be done with precautions, i.e., after specifying the nominal terrain (Section 2) and the resolution at which the quality makes sense (Section 6), and it must be based on relevant criteria among the multitude reviewed in Section 4.However, in many works the nominal terrain and the resolution are not specified by the user, and the used quality criteria are not suitable for the application.Quality control must be imposed by the user on the basis of the requirements foreseen for the application, but these requirements are not always easy to express with criteria useful to the DEM producer, not to mention the case of multi-user databases that have to reconcile a multitude of quality criteria.In practice, the quality of a DEM is specified by the producer and the quality indications highlight the product or method rather than informing users of the limitations of which they should be aware.This raises the question of who specifies the DEM characteristics.In any situation (i.e., local DEM produced on-demand or multi-user database), a DEM producer has to answer the following questions: what to represent (upper surface or ground)? in what format (vector or raster)?and for whom (for which application)?Many users utilize freely available DEMs in their studies without questioning the quality standard of these products or their convenience for their studies, which can lead to wrong results in some cases as we have pointed out [202].Although it is essential to take the application into account, a difficulty in letting the user specify the DEM quality is the risk of a disproportionate requirement, which seeks to get the most out of a method even if it is not necessary.For example, the disadvantages of high resolution have been mentioned.More generally, over-specification should be avoided although the temptation exists when a small object must necessarily be represented, which would require resolution and accuracy performances that most users do not need.For these reasons, the product specifications adopted for intermediate resolution DEMs like SRTM, TanDEM-X, ASTER GDEM, with meshes between 10 and 100 m, are generally relevant for a wider user community than high resolution products like the digital globe AES (Advanced Elevation Suite) DEM, the Airbus Elevation1 and Elevation4 DEMs, and that of the planned mission CO3D (Constellation Optique 3D), which seeks to model everything on the surface of the planet in 3D with 0.5 m ground spacing and 1 m vertical accuracy, at the risk of including ephemeral and uninteresting objects.Indeed, information on micro-relief and buildings is often available in 2D, and if 3D is really required it could be locally modelled on demand.
Beyond a posteriori quality control of an existing DEM, the quality expected from a DEM which does not yet exist may be questioned, depending on the method used and the geographical characteristics of the area mapped (climate, relief, landcover).The quality of the DEM can be predicted to a certain extent considering the instrumental and orbital characteristics of the imaging system.Indeed, theoretical equations make it possible to estimate indicators like the altimetric standard deviation σ z and therefore to predict DEM external quality.However, this approach remains limited for several reasons: • σ z is a very optimistic indicator and it may give the illusion that the accuracy of a global DEM is the same all over the world.σ z is a very limited indicator as we have shown, excellent for predicting the vertical accuracy of the DEM but unsuitable for guaranteeing a realistic shape rendering.• σ z also depends on slope and landcover, and this double influence cannot be predicted quantitatively because the slope is not necessarily known before the DEM is calculated and the landcover can be described qualitatively at best.

•
The interpolation, which is applied to resample the DEM in the required grid structure, also has an impact on the quality by creating artifacts.
Moreover, some of the parameters in the equation used to predict the DEM quality have a well-known but unpredictable influence.This is the case of the baseline in repeat-pass SAR interferometry: its influence on accuracy and phase coherence is well known, but it is impossible to predict its value before image acquisition, hence very heterogeneous performances between DEMs made yet with the same sensor and the same processing method.
Therefore, the performance of a DEM production method cannot be predicted from theoretical equations alone, it also requires a great deal of experience based on the analysis of many DEMs or on simulation-based studies.The theoretical approach also focuses on external quality criteria which we have shown to be of limited relevance for many applications, while internal criteria related to the respect of landforms can be relevant.As shown in Section 4.2, the detection of landform inconsistencies can first be based on visual analysis, which is a subjective approach of quality assessment, but we have also shown that objective methods (based on quantitative criteria) can be implemented as automated procedures to find and quantify inconsistencies.When external data are missing, this internal validation approach becomes more relevant and the corresponding quality criteria could be programmed in a dedicated software together with external quality criteria.Those criteria can be generic and applied to any DEM, but they can also be specific to a given production method or product, like a DEM expected to model the ground surface in forested areas.
Beyond mere quality assessment, the analysis of DEM errors and the understanding of the phenomena that degrade the quality lead to DEM improvement possibilities, either during the elaboration of the product, or a posteriori [203,204].The external approach (based on reference data) consists in correcting a bias, for instance removing tree height to transform a DSM into a DTM, using external data to locate forest patches and an empirical method to estimate tree height.The internal approach (based on physical or statistical assumptions) consists in removing the artifacts and forcing the DEM to comply with certain rules.For example, sinks can be filled to force rivers to stream down.The DEM can be resampled according to a required statistical behaviour, to filter noise or inversely to create variability by kriging or fractal synthesis; assumptions can also be injected a priori to improve the DEM calculation, for instance Monti-Guarnieri [205] uses a multifractal model of the Earth's surface to model the statistical distribution of slopes, which may be used, among other applications, to guide phase unwrapping [206].All these DEM improvement possibilities based on reference data or geomorphological assumptions make it possible to avoid over-specification, the risk of which has been mentioned above.This opens new opportunities to reduce costs by accepting intermediate quality DEMs that can be improved in accordance with the requirements of a given application, in particular by merging a DEM with other DEMs or additional information.This approach could benefit from recent advances in deep learning.
Finally, a wide field of research is still opening up after several decades in the field of DEM quality assessment [72,88] for the benefit of producers and users.In parallel with the development of new sensors, new image processing algorithms and tests using DEMs under new conditions, further studies based on different research fields are needed to better understand the error behaviour and its impact on the expected result for different DEM-based applications.Such studies contribute to the characterization of complex topographic objects and their interrelations, based on the adaptation of existing mathematical tools to the complexity of the real world (from signal processing to machine learning).They open promising research paths and the critical review proposed in this article allows to identify some of them:

•
Mathematical modelling of the error expected for different DEM production techniques as a function of landscape characteristics and acquisition/processing parameters.

•
Definition of invariant properties of the Earth's surface to support internal quality assessment methods, based on the advances of geoscience and even comparative planetology.

•
Analysis of scale effects, which has become a challenging research issue with the development of very high-resolution products, leading to increased needs for gigantic data volumes management and for the characterization of extremely complex phenomena.
Studies on the quality of DEMs require and stimulate fundamental research in both geoscience and mathematics, but the scientific community must also remain attentive to the expectations of DEM producers and operational users.Indeed, beyond the standards proposed by DEM producers to specify their products, there is a need to clarify relevant quality criteria that are understandable to both users and producers of digital elevation models in a common language.

Conclusions
The review presented in this article is based on a comprehensive bibliography and it links the different aspects of DEM quality assessment that are often addressed separately in the literature.The main approaches of DEM quality are presented in a methodical synthesis though far from being exhaustive, in which the quality criteria are interpreted according to user needs.Methods are proposed to detect and quantify errors and artifacts in DEMs, but also to make choices between different production techniques and to specify adapted requirements.The main conclusions are as follows:

•
The quality of a DEM makes sense for a given application, i.e., depends on user needs.

Figure 2 .
Figure 2. DTM vs DSM in the presence of trees and buildings.

Figure 2 .
Figure 2. DTM vs DSM in the presence of trees and buildings.

Figure 5 .
Figure 5. Method for the elaboration of an elevation error model.

Figure 5 .
Figure 5. Method for the elaboration of an elevation error model.

Figure 6 .
Figure 6.General workflow of simulation-based evaluation of topographic restitution methods.

Figure 6 .
Figure 6.General workflow of simulation-based evaluation of topographic restitution methods.

Figure 7 .
Figure 7. Unrealistic background landscapes in Renaissance paintings (example of Mona Lisa from Leonardo da Vinci).

Figure 7 .
Figure 7. Unrealistic background landscapes in Renaissance paintings (example of Mona Lisa from Leonardo da Vinci).

36 Figure 8 .
Figure 8.Comparison of hillshade (top) and hypsometry (bottom) for DEM artifact detection (from left to right: original DEM, DEM with Gaussian noise, 8-bit coded DEM, DEM subsampled 4 times).Source: DEM produced by contour-line interpolation in a 1:20,000 topographic map of Lebanon.

Figure 8 .
Figure 8.Comparison of hillshade (top) and hypsometry (bottom) for DEM artifact detection (from left to right: original DEM, DEM with Gaussian noise, 8-bit coded DEM, DEM subsampled 4 times).Source: DEM produced by contour-line interpolation in a 1:20,000 topographic map of Lebanon.

36 Figure 10 .
Figure 10.Fractal behaviour of the hydrographic network as illustrated by the variation of the logarithm of the number of streams for each Strahler order as a function of this order, for orders 1 to 12 (a) and 3 to 12 (b).Source: [108].

Figure 11 .
Figure 11.Control of the compliance of a DEM with Benford's law for two metrics: elevation and

Figure 10 .
Figure 10.Fractal behaviour of the hydrographic network as illustrated by the variation of the logarithm of the number of streams for each Strahler order as a function of this order, for orders 1 to 12 (a) and 3 to 12 (b).Source: [108].

Figure 11 .
Figure 11.Control of the compliance of a DEM with Benford's law for two metrics: elevation and slope (source: [125]).

Figure 11 .
Figure 11.Control of the compliance of a DEM with Benford's law for two metrics: elevation and slope (source: [125]).

Figure 13 .
Figure 13.Main types of errors in the drainage network extracted from a DEM (blue line: real network; red line: network extracted from the DEM).

Figure 13 .
Figure 13.Main types of errors in the drainage network extracted from a DEM (blue line: real network; red line: network extracted from the DEM).

Figure 15 .
Figure 15.Effect of DEM subsampling on elevation and slope histograms.The distributions of elevation (a) and slope (b) are computed for an input DEM (obtained from a SPOT-4 stereo pair overLebanon with 10 m ground sampling distance) and for subsampled DEMs with the increasing ratios of 2, 4, 8.A higher subsampling ratio leads to larger and less numerous cells (source:[90]).

Figure 15 .
Figure 15.Effect of DEM subsampling on elevation and slope histograms.The distributions of elevation (a) and slope (b) are computed for an input DEM (obtained from a SPOT-4 stereo pair over Lebanon with 10 m ground sampling distance) and for subsampled DEMs with the increasing ratios of 2, 4, 8.A higher subsampling ratio leads to larger and less numerous cells (source:[90]).