Digital Terrain Models Generated with Low-Cost UAV Photogrammetry: Methodology and Accuracy

Digital terrain model (DTM) generation is essential to recreating terrain morphology once the external elements are removed. Traditional survey methods are still used to collect accurate geographic data on the land surface. Given the emergence of unmanned aerial vehicles (UAVs) equipped with low-cost digital cameras and better photogrammetric methods for digital mapping, efficient approaches are necessary to allow rapid land surveys with high accuracy. This paper provides a review, complemented with the authors’ experience, regarding the UAV photogrammetric process and field survey parameters for DTM generation using popular commercial photogrammetric software to process images obtained with fixed-wing or multicopter UAVs. We analyzed the quality and accuracy of the DTMs based on four categories: (i) the UAV system (UAV platforms and camera); (ii) flight planning and image acquisition (flight altitude, image overlap, UAV speed, orientation of the flight line, camera configuration, and georeferencing); (iii) photogrammetric DTM generation (software, image alignment, dense point cloud generation, and ground filtering); (iv) geomorphology and land use/cover. For flat terrain, UAV photogrammetry provided a horizontal root mean square error (RMSE) between 1 to 3 × the ground sample distance (GSD) and a vertical RMSE between 1 to 4.5 × GSD, and, for complex topography, a horizontal RMSE between 1 to 7 × GSD and a vertical RMSE between 1.5 to 5 × GSD. Finally, we stress that UAV photogrammetry can provide DTMs with high accuracy when the photogrammetric process variables are optimized.


Introduction
Many applications require the generation of digital terrain models (DTMs), generated by the interpolation of points belonging to the bare land surface [1] from altimetric data produced from conventional or advance survey methods. Among the suitable quality methods, those based on total stations (TS) or Global Navigation Satellite Systems (GNSS) help collect accurate geographic data on the land surface. However, collecting highresolution field data using these methods is often time-consuming and costly [2,3].
With the development and deployment of Laser Imaging Detection and Ranging (LiDAR) systems and terrestrial laser scanners (TLS), field survey data acquisition has been streamlined, as information can be obtained with higher spatial resolutions, and surveyed surfaces are better represented [4]. However, the main disadvantage of LiDAR technology is that it is still not cost-efficient [5].
The selection of the type of UAV platform (fixed-wing or multicopters) depends on the specific application, the necessary resolution in the 3D point cloud, the area and location of the study site, and the weather conditions. The 3D point cloud's accuracy appears to be independent of the UAV platform-e.g., [33,34]. Ruggles et al. [34] found that the point cloud resolution improved when using multicopter UAVs instead of fixed-wing UAVs.However, this also depends on the camera used to acquire images. Gómez-Gutiérrez and Gonçalves [33] found that a point cloud obtained using the multicopter detected smaller changes than with a point cloud produced by the fixed-wing. They concluded that the fixed-wing might be a better alternative to the multicopter when exploring vertical features with similar or lower slope gradients (<52 • ). Multicopters can often carry a greater payload, allowing for the installation of more advanced and complex sensing systems. Fixed-wing UAVs are more suitable for capturing images of larger areas.
Due to the ability to fly at low altitudes, multicopters are more suitable when finer surface details are required, and they are commonly used to capture oblique aerial images. They also can take off and land in a small area. However, the coverage area is limited due to the relatively low flight speed and high battery drain [35] and tends to be more negatively impacted by environmental factors, such as extreme temperatures. On sites that are not easily accessible, a platform with compact size and weight, preferably suitable to carry in a backpack, is recommended; in this situation, a multicopter is more suitable than a fixed wing [10].

Camera Calibration
Camera calibration has traditionally been and continues to be the single most significant factor determining the accuracy potential and, to a large extent, the reliability of close-range photogrammetric measurements [36]. UAVs are generally equipped with non-metric RGB digital cameras, and are typically not designed explicitly for photogrammetric surveying.
Non-metric RGB digital cameras are a popular choice due to their light weight and low cost. However, the type of selected camera and image resolution can influence the final product accuracy-e.g., [34]. These cameras have good radiometric quality but low geometric quality, which is caused by lens distortion. Therefore, it is highly recommended to perform calibration to obtain reliable photogrammetric measurements.
Camera calibration can be performed with two strategies: either performed independently of aerial acquisitions (pre-calibration) or included in the bundle block adjustment (self-calibration). The pre-calibration is often performed in-lab using convergent images and varying scene depth [37]. While most commercial software includes camera self-calibration, this can also be realized using software (e.g., Agisoft Lens or Photomodeler) and predetermined calibration sheets.
Self-calibration has greatly simplified the calibration task and is likely to remain the most applied method within different studies. Luhmann et al. [36] described selfcalibration rules for minimizing observation errors and providing more accurate calibration parameter estimates. These rules include incorporating oblique images in the project or fixed zoom/focus and aperture settings with no lens change or adjustments during image acquisition [36]. Following these well-proven rules for self-calibration can allow for reliable measurements from almost any camera.

Flight Planning and Image Acquisition
Flight planning is likely the most complex and most important part of fieldwork. It involves many considerations that have a significant influence on the quality and accuracy of the DTM. It is also not easy to go back and acquire new data due to planning or logistical problems, such as flight authorization and weather.
Based on the necessary characteristics in the final DTM (the expected resolution and accuracy), certain planning parameters are defined before the flight ( Figure 1): altitude, image overlap (front and side overlap), UAV speed, parameters related to the orientation of the flight lines, and the number of ground control points (GCPs) and checkpoints (CPs). To define these parameters, it is also essential to know the platforms' operating restrictions in the country or province where the flight will occur. Despite the importance of these parameters, not enough processing details were provided to fully understand the causes of variability in many works. Camera calibration can be performed with two strategies: either performed independently of aerial acquisitions (pre-calibration) or included in the bundle block adjustment (self-calibration). The pre-calibration is often performed in-lab using convergent images and varying scene depth [37]. While most commercial software includes camera selfcalibration, this can also be realized using software (e.g., Agisoft Lens or Photomodeler) and predetermined calibration sheets.
Self-calibration has greatly simplified the calibration task and is likely to remain the most applied method within different studies. Luhmann et al. [36] described self-calibration rules for minimizing observation errors and providing more accurate calibration parameter estimates. These rules include incorporating oblique images in the project or fixed zoom/focus and aperture settings with no lens change or adjustments during image acquisition [36]. Following these well-proven rules for self-calibration can allow for reliable measurements from almost any camera.

Flight Planning and Image Acquisition
Flight planning is likely the most complex and most important part of fieldwork. It involves many considerations that have a significant influence on the quality and accuracy of the DTM. It is also not easy to go back and acquire new data due to planning or logistical problems, such as flight authorization and weather.
Based on the necessary characteristics in the final DTM (the expected resolution and accuracy), certain planning parameters are defined before the flight ( Figure 1): altitude, image overlap (front and side overlap), UAV speed, parameters related to the orientation of the flight lines, and the number of ground control points (GCPs) and checkpoints (CPs). To define these parameters, it is also essential to know the platforms' operating restrictions in the country or province where the flight will occur. Despite the importance of these parameters, not enough processing details were provided to fully understand the causes of variability in many works.

Flight Altitude above Ground Level (AGL)
One of the most critical parameters in a UAV flight is altitude. The altitude determines the spatial resolution in registered images, flight duration, the number of images per unit area, and the area covered. Flight altitude is influenced by the value of the ground sample distance (GSD) and the camera sensor's internal parameters. Equations (1) and (2) are used to calculate the AGL, and the smallest value resulting from both equations is chosen [38].
where AGL is the flight altitude above ground level (AGL) (m); f is the focal distance (mm); GSD (m/pixel); HR and VR are the horizontal and vertical resolutions of the sensor (px); SW is the sensor width (mm); SH is the sensor height (mm). Most scientific studies capture imagery with GSD values between <0.01 and 0.50 m and altitudes between 5 and 250 m [39]. A low flight altitude indicates high spatial resolutions but covers a limited area on the ground and increases a particular area's flight duration and processing time. A high flight altitude (>120 m) can cause the GCPs to not be distinguished in the images. In many countries, this is also commonly regulated (Table 1). High spatial resolutions do not necessarily imply high accuracy in the generated DTM. The problem associated with high spatial resolution is related to the computing power needed, since an increase in resolution means a significant increase in the data volume [40]. For mapping, if the terrain is flat or almost flat, the usual method of capturing the terrain with a UAV is to fly horizontally at a constant height above the mean sea level (MSL). In the abruptly changing terrain, the flight altitude must adapt to the ground's height in each flight line instead of maintaining a constant height above the MSL.
The previous recommendation is because, when an UAV is flying at a constant height above the MSL, researchers found that the vertical root mean square error (RMSE) values were larger in areas with complex morphology compared with in flat areas [42,43]. In complex morphology, the distance between the sensor and the ground is not constant, and the overlap is reduced and could become critically low in very steep areas, which causes fewer images to overlap in steeper areas compared with in low areas ( Figure 2). Different authors studied the DTM accuracy with respect to different AGL values ( Table 2). Gómez-Candón et al. [44] found that the RMSE of the DEM increased 1 cm with increasing flight altitude from 30 to 60 m, while at altitudes from 60 to 100 m, the RMSE was almost constant. Agüera-Vega et al. [45] indicated that, when GSD increased, the vertical accuracy tended to decrease, and the horizontal accuracy was not influenced by the flight altitude; they found that vertical RMSE increased at 50 to 80 m, and, at altitudes from 80 to 120 m, it was almost constant. These studies showed that the higher the flight altitude, the greater the RMSE in the DEM; however, it reached an altitude (>60 m) where the RMSE was almost constant.  Different authors studied the DTM accuracy with respect to different AGL values ( Table 2). Gómez-Candón et al. [44] found that the RMSE of the DEM increased 1 cm with increasing flight altitude from 30 to 60 m, while at altitudes from 60 to 100 m, the RMSE was almost constant. Agüera-Vega et al. [45] indicated that, when GSD increased, the vertical accuracy tended to decrease, and the horizontal accuracy was not influenced by the Rock et al. [46] found that the indirect sensor orientation accuracy decreased with the increasing flight altitude; however, they reported the best accuracies for intermediate 7 of 27 flight altitudes (100 to 150 m). Yurtseven [47] found that the low altitude data were affected by the phenomenon called the "doming effect", which is considered an imperfection of the 3D reconstruction algorithm for photogrammetric processes. The results reported by Zimmerman et al. [48] showed that flying at higher altitudes (>90 m) produced a more accurate DEM. Yurtseven [47] found similar results at lower altitudes (<50) where the error in the DEM increased.
Previous studies indicated that both low and high flight altitudes affected the accuracy of the DEM. The doming effect that occurs at low altitudes could be corrected by increasing the number of GCPs [47]; however, this would increase the capital and logistics costs. Instead of using a large number of GCPs, the problem can also be solved by using a GNSS RTK (Real Time Kinematic)-equipped platform, as shown in [49] and subsequently in [50]. The results shown by Gómez-Candón et al. [44] can be explained by the use of a large number of GCPs.
In this sense, it is necessary to define, for efficiency and time, a minimum altitude at which a GSD value is guaranteed to detect the desired surface details. In addition, the type of platform must be taken into account. Singh and Frazier [39] considered that the minimum mapping unit (MMU) should be considered in decisions regarding spatial resolution. The MMU may help researchers balance the data volume and processing costs to determine the most appropriate GSD for the output products. According to Table 2, the optimal flight altitude should be between 70 to 150 m; thus, an average vertical RMSE of 2 × GSD was reported. Regarding the maximum altitude, an altitude must be selected at which the quality of the 3D point cloud is not lost and the maximum flight altitude allowed in the country is taken into account.

Image Overlap
In conventional photogrammetric flights, a front overlap of 55 to 60% and a side overlap of 15 to 25% is typically recommended. However, UAV images must have a high percentage of overlap so that the photogrammetric processing of images can potentially benefit from the resulting redundancy, which would still allow the generation of highquality 3D point clouds from dense multi-image matching [51].
There is a positive relationship between the image overlap and the accuracy of digital elevation models (DEMs). The accuracy is increased with the increased overlap percentage, and the object's shape is optimized [52]. Photogrammetric software, such as Agisoft Metashape recommends that UAV images be acquired almost with 80% front overlap and 60% side overlap [53]. Pix4D suggests at least 75% front overlap and 60% side overlap [54]; generally, the front overlap is equal to or greater than the side overlap.
However, with exaggerated overlaps, stereoscopic vision is lost in the photogrammetric reconstruction, and the processing time is increased without improving the quality of the final products. Overlap greater than 90% can generate deformations in the digital model-e.g., [11]. In this sense, it is recommended for topographic surveys and DTM generation to use front overlaps between 70% and 90% and side overlaps of 60% to 80%. The lower the AGL, the closer the overlap should be to the upper limit.

UAV Speed
The UAV flight speed is a crucial user-defined parameter because it affects the image quality and power consumption of the UAV. However, studies have not considered UAV flight speed in sufficient detail, given its importance in determining the DEM accuracy. Some research has been carried out to determine the optimal speed according to the unit distance energy consumption [55], and others added the wind's effect to select the optimal speed [56].
Several variables define the selection of UAV flight speed. Among them are the maximum UAV flight speed recommended by the manufacturer, the wind speed and direction, the camera's shutter speed, and the operating restrictions of the country. Wind conditions substantially affect 3D point clouds and DEMs that have not been studied; high wind speeds tilt the UAV drastically and lead to large pitch and roll angles. They will also cause the UAV to use more power during flight and generally reduce the UAV stability [57].
Therefore, the flight speed must be programmed, considering the maximum wind speed at which the platform is sensitive. The shutter speed is closely related to the flight speed; Roth et al. [58] determined that the wrong shutter speed settings are a significant cause of motion blur. A long shutter time, combined with a fast flight speed, may force motion blur. An option to reduce the motion blur is if the UAV stops to take the image. However, this would cause greater energy consumption and, therefore, the survey area for each battery use would be reduced.
The flight speed demanded by the user should be connected to the expected quality of the images. Therefore, Roth et al. [58] proposed Equation (3) where the flight speed is chosen based on the maximum tolerable motion blur and recommended keeping the motion blur (usually denoted as a percentage of the size of a pixel) as low as possible, but at least <50%. For the limits of this equation, the document should be consulted. However, the other variables indicated above must also be considered.
where S is the UAV speed (m/s); δ is the maximum motion blur (px); and l t is the shutter speed (s).

Orientation of the Flight Lines and Camera Configuration
Generally, the flight plans are designed as parallel flight lines (patterns, such as backand-forth and spiral) at a stable altitude with consistent overlap, and a nadir-facing camera angle to achieve regular along-flight-line stereoscopic coverage. This configuration has traditionally been considered as the most effective to acquire, particularly in time and simplicity. It can be automatically generated by specifying a few basic flight parameters in flight planning software. However, the single look-direction, gridded image blocks typically do not capture enough detail or geometric information in more complex scenes and cause the resultant point cloud to contain artificial doming due to error accumulation in the SfM process-e.g., [59].
Therefore, various flight configurations have been studied for reducing systematic dome errors and for increasing the accuracy of the DEMs, such as single grid missions supplemented with oblique images or with the arc flight plan, double grid missions (with the acquisition of vertical or oblique images or both) or different flight altitudes in the same flight plan. Ali and Abed [60] used two types of flight configuration (single grid and double grid mission) to acquire vertical images in two different altitudes (100 and 120 m) and found a higher RMSE z in the DEM that was generated with the double grid mission.
James and Robson [61] indicated that augmenting the image block with an additional set of flight lines at a different azimuth heading did not significantly reduce the systematic DEM deformation. Sanz-Ablanedo et al. [62] indicated that only intermediate results are obtained when different flight designs that included only vertical imagery were mixed. The DEM accuracy is better when oblique and vertical imageries are integrated compared with when using only vertical imagery [52], as the estimation of the exterior and interior (according to [50]) orientation parameters of the airborne imagery in self-calibration is improved.
Nesbit and Hugenholtz [63] found that incorporating oblique images with 15-35 • tilt angles generally increase the accuracy, and single-angle image sets at higher-oblique angles (30-35 • ) could produce reliable results if combination datasets were not possible. The use of oblique images is particularly appropriate in hilly terrain with rugged topography and overhangs or for surveying subvertical walls [14,42].
Therefore, the images should be acquired in preprogrammed flight using continuous automatic shoot mode [11]. Flight lines must be added to traditional flight plans (patterns, such as back-and-forth and spiral) to capture oblique images. It is true that the combination of vertical (nadir) and oblique images requires even more processing time; however, the result is a 3D point cloud with higher quality.
The orientation of the flight lines must be based on the terrain morphology. On rectangular surfaces, it is most convenient for the flight direction to be parallel to the longest side of the rectangle (Figure 3). The characteristics of the camera are shown in Table 3.
Nesbit and Hugenholtz [63] found that incorporating oblique images with 15-35° tilt angles generally increase the accuracy, and single-angle image sets at higher-oblique angles (30-35°) could produce reliable results if combination datasets were not possible. The use of oblique images is particularly appropriate in hilly terrain with rugged topography and overhangs or for surveying subvertical walls [14,42].
Therefore, the images should be acquired in preprogrammed flight using continuous automatic shoot mode [11]. Flight lines must be added to traditional flight plans (patterns, such as back-and-forth and spiral) to capture oblique images. It is true that the combination of vertical (nadir) and oblique images requires even more processing time; however, the result is a 3D point cloud with higher quality.
The orientation of the flight lines must be based on the terrain morphology. On rectangular surfaces, it is most convenient for the flight direction to be parallel to the longest side of the rectangle (Figure 3). The characteristics of the camera are shown in Table 3.

Georeferencing, GCPs, and CPs
To guarantee a certain degree of accuracy in digital models using UAV photogrammetry, it might be necessary to collect GCPs. These points can be either permanent ground features or reference targets scattered on the ground, which must be surveyed to obtain their precise coordinates and ensure that they are identifiable on the raw images [64]. In addition, the numbers of surveyed GCPs should also include additional check points (CPs), which will be used to assess the resulting data quality. The GCPs are used for georeferencing the 3D point cloud and to improve the estimation of the internal and external orientation parameters in the SfM process. At the same time, the DEM accuracies will be evaluated by comparing the values of the coordinates of the CPs as computed in the aerial triangulation solution to the coordinates of the surveyed CPs.
One of the problems that commonly arises with UAV photogrammetry is the number of GCPs that must be established to achieve the desired accuracy. It is widely recognized

Georeferencing, GCPs, and CPs
To guarantee a certain degree of accuracy in digital models using UAV photogrammetry, it might be necessary to collect GCPs. These points can be either permanent ground features or reference targets scattered on the ground, which must be surveyed to obtain their precise coordinates and ensure that they are identifiable on the raw images [64]. In addition, the numbers of surveyed GCPs should also include additional check points (CPs), which will be used to assess the resulting data quality. The GCPs are used for georeferencing the 3D point cloud and to improve the estimation of the internal and external orientation parameters in the SfM process. At the same time, the DEM accuracies will be evaluated by comparing the values of the coordinates of the CPs as computed in the aerial triangulation solution to the coordinates of the surveyed CPs.
One of the problems that commonly arises with UAV photogrammetry is the number of GCPs that must be established to achieve the desired accuracy. It is widely recognized that the more GCPs used, the better the resulting accuracy will be. However, when increasing the number of GCPs until a specific density of GCPs is reached, the accuracy can increase asymptotically [46,65]. In addition, establishing large numbers of points is time-consuming and may erode many of the cost advantages of surveying using UAV [32].
In practice, many more GCPs than the minimum required are usually established, and different recommendations for the number of GCPs are reported in various works (Table 4). Tahar [66] found that it is necessary to establish at least seven GCPs on a given surface. The author used between 4 to 12 GCPs, and the vertical RMSE in digital models decreased after using seven or more GCPs. Jiménez-Jiménez et al. [67] found that at least five GCPs distributed throughout the study area are essential. It is necessary to establish one GCP for every 3 ha to obtain vertical RMSE values close to 3 × GSD. The study land area was about 37 ha in that research and was approximately rectangular-shaped.  [68] reported, in a study carried out on 29 ha of urban parkland with flat terrain morphology, that errors (vertical RMSE) of about 2 × GSD could be achieved when using one GCP for every 2 ha of ground area and utilizing more GCPs produced identical results while, in a study site of 17 ha whose morphology included a wide range of slope values, Martínez-Carricondo et al. [69] found that a vertical RMSE of about 1.6 × GSD could be achieved when using one GCP with stratified distribution for each ha. Santise et al. [70] found that a vertical RMSE of about 1.3 × GSD could be achieved with approximately 1 GCP/ha (28 GCPs in 25 ha).
These recommendations range from 0.3-1.0 GCP/ha to obtain a vertical RMSE between 1-3 × GSD. The above data and other studies reported in the literature [39] showed that the number of GCPs that must be established per unit area is not yet clear, at least for all types of morphology and area size. Therefore, different studies have been generated with a different approach. Sanz-Ablanedo et al. [30] related the number of GCPs per 100 images acquired with the UAV and found that vertical RMSE values of 2 × GSD could be achieved with two or more GCPs per 100 images acquired. Vertical RMSE values could also be improved toward 1.5 × GSD by using four GCPs per 100 images. They also found horizontal RMSE values similar to ± one GSD with approximately 2.5 to 3 GCPs per 100 images. These criteria to define the number of GCPs to use appears to be a good estimator as they involve the AGL, image overlap, orientation of the flight lines, and camera configuration.
The distribution of GCPs also influences the DEM accuracy. The accuracy may decrease slightly when increasing the number of GCPs when the GCPs are not well distributed. Different distributions of GCPs have been studied to try to optimize the products obtained by UAV photogrammetry (Table 5). Harwin and Lucieer (2012) [72] recommended that the GCPs be distributed throughout the focus area and adapted to the relief, resulting in more GCPs in steeper terrains.
Rangel et al. [9] conducted a study on 270 ha using thirteen different configurations of the number and distribution of the GCPs and concluded that the insertion of GCPs in the central part of the block did not significantly contribute to an increase in the horizontal accuracy of the geospatial products. To achieve optimal results regarding the planimetry, GCPs must be placed on the edge of the study area with a horizontal separation of 7 to 8 ground base units (horizontal distance between the centers of two consecutive images). Similar results were reported in Martínez-Carricondo et al. [69]. GCPs need to be added in the central part of the area with a horizontal separation of 3 to 4 ground base units with a stratified distribution to increase altimetric accuracy, according to Martínez-Carricondo et al. [69].
The referred studies were developed on square-shaped or rectangular-shaped terrain. However, more specific studies may be necessary to obtain the site topography (DTM) where one dimension is much larger than another, such as roads, linear power distribution, pipelines, and channels. Thus, it cannot be guaranteed that the conclusions drawn from the studies cited above can be applied to these sites. For these types of sites, Ferrer-González et al. [64] recommend using 4 to 5 GCPs/km distributed alternatively on both sides of the linear work in an offset or a zigzagging pattern, with a pair of GCPs at each end. An optimal configuration of the GCPs should cover all four corners of the site, the highest and lowest elevations, and with sufficient site coverage Harwin and Lucieer [72] Low and high slopes 5-27 6 The GCPs be distributed throughout the focus area and adapted to the relief, resulting in more GCPs in steeper terrains Rangel et al. [9] 270 Abrupt changes of slope and flat areas 6-54 13 Establish The best accuracies were achieved by placing GCPs around the edge of the study area; however, it was also essential to place GCPs inside the area with a stratified distribution to optimize the vertical accuracy.
GCPs are not the only option for georeferencing. In recent years, an alternative to georeferencing using GCPs (indirect georeferencing) emerged as direct georeferencing using a platform with a survey-grade GNSS/RTK receiver (RTK UAV). Hugenholtz et al. [29] used these two types of georeferencing to achieve a similar horizontal RMSE; however, the vertical RMSE values were two to three times greater with direct georeferencing. They concluded that, in applications requiring a vertical RMSE better than ±0.12 m, GCPs should be used rather than a GNSS/RTK platform. Štroner at al. [49] indicated that, with these platforms, the vertical accuracy can improve up to a level of accuracy of 1-2 × GSD using a small number of GCPs (at least one). Taddia et al. [73] found that, when using vertical and oblique images in the photogrammetric block (without a GCP), it was possible to obtain accuracies similar to the DTMs referenced with GCPs. It is unclear whether direct georeferencing will supersede GCPs to become the standard referencing technique for UAV blocks. However, with the emergence of low-cost platforms (e.g., DJI Phantom 4 RTK), the use will increase, especially in large or with difficult-to-access areas, or where the survey of GCPs is complicated.
In the case of CPs, there is no consensus regarding the sample size. However, the National Map Accuracy Standard (NMAS) and National Standard for Spatial Data Accuracy (NSSDA) standard recommend a minimum of 20 CPs. In contrast, the ASPRS Positional Accuracy Standards for Digital Geospatial Data recommended the number of CPs based on area and indicated that in no case shall a non-vegetated terrain vertical accuracy be based on less than 25 CPs. CPs should be distributed more densely, close to essential features, and more sparingly in areas of little or no interest.

Photogrammetric DTM Generation
SfM algorithms facilitate the production of detailed topographic models from images collected with UAVs. The primary product of the SfM process is a 3D point cloud of identifiable features present in the input images. Later, a DEM (DSM or DTM) and a georeferenced orthomosaic can be generated.

Software
There is a range of software packages using the SfM approach that are currently powerful and efficient enough to work with a large set of images and automatically provide results in a relatively short time. They are included as desktop packages, such as Agisoft MetaShape (formerly PhotoScan), Pix4D, PhotoModeler, SimActive CORRELATOR3D, Inpho UASMaster, MicMac, VisualSfM, Bundler, CMVS, as well as the online-processing solutions, such as DroneDeploy, etc.
These software follow a general workflow with some phases of data processing ( Figure 4). The phases include (1) importing the images into software, (2) alignment between overlapping images, (3) georeferencing images using GCPs to optimize the camera position and orientation, (4) dense point cloud generation of a 3D mesh, (5) ground filtering with or without above ground object points, (6) eliminating or keeping all-natural (vegetation) or built (building, houses, etc.) above-ground objects from the dense point cloud, (7) if the above objects are eliminated a mesh, a DTM is created, and (8) if the above objects are kept in the dense point cloud, a DSM and orthmosaic are created.
Even if the software can automatically provide results, operator intervention is necessary for certain phases of the data processing, especially to check the alignment accuracy and to remove points belonging to aboveground objects to retrieve ground points for generating DTMs [5].   [75] found that MicMac and Photoscan (Metashape) provided similar horizontal and vertical errors within a control region (GCPs delimited). PhotoScan reconstructed topographic details better than MicMac, especially on surfaces with substantial slope changes outside of the control region. Sona et al. [76] found that PhotoScan provided good performance, especially in flat areas and in the presence of shadows. Professionals most commonly employ desktop software, such as Pho-toScan, Pix4D, and Photomodeler because they are more straightforward to use. However, most of the processing is done in a black-box model. Many users use open-source software (e.g., MicMac, ColMap or AliceVision) as they are more flexible but are recommended for experienced users.

Image Alignment and Dense Point Cloud Generation
In the first data processing step, the images are imported. To reduce the processing time involved with images in DNG format, Alfio et al. [77] found that the best type of dataset to preserve the photogrammetric process's quality (obtained with images in DNG format) was using JPEG images with a compression level of 12.
In the next step, SfM aligns the imagery solving the collinearity equations in an arbitrarily scaled coordinate system without any initial requirements of external information (camera location and attitude or GCPs) [74]. Software packages typically automatically generate key points in each image. The number of key points in an image is primarily dependent on the image texture and resolution, such that complex images at high resolutions will return the most results [6]. Later, matching key points are identified, and inconsistent matches are removed.  [75] found that MicMac and Photoscan (Metashape) provided similar horizontal and vertical errors within a control region (GCPs delimited). PhotoScan reconstructed topographic details better than MicMac, especially on surfaces with substantial slope changes outside of the control region. Sona et al. [76] found that PhotoScan provided good performance, especially in flat areas and in the presence of shadows. Professionals most commonly employ desktop software, such as PhotoScan, Pix4D, and Photomodeler because they are more straightforward to use. However, most of the processing is done in a black-box model. Many users use open-source software (e.g., MicMac, ColMap or AliceVision) as they are more flexible but are recommended for experienced users.

Image Alignment and Dense Point Cloud Generation
In the first data processing step, the images are imported. To reduce the processing time involved with images in DNG format, Alfio et al. [77] found that the best type of dataset to preserve the photogrammetric process's quality (obtained with images in DNG format) was using JPEG images with a compression level of 12.
In the next step, SfM aligns the imagery solving the collinearity equations in an arbitrarily scaled coordinate system without any initial requirements of external information (camera location and attitude or GCPs) [74]. Software packages typically automatically generate key points in each image. The number of key points in an image is primarily dependent on the image texture and resolution, such that complex images at high resolutions will return the most results [6]. Later, matching key points are identified, and inconsistent matches are removed.
A bundle-adjustment algorithm is used to simultaneously solve the 3D geometry of the scene, the different camera positions, and the camera parameters [74]. This step's output is a sparse point cloud generated in a relative 'image-space' coordinate system. The number of overlapping images that result after alignment is not constant throughout the area because, near the edges, there are fewer overlapping images compared with in the central area ( Figure 5). This misalignment causes the measurements made in these areas to be less accurate than those made in the central areas; therefore, a wider area must be covered compared with the actual area of interest. This misalignment is shown in the blurring and overlapping edges of the houses shown in the upper picture of Figure 3. A bundle-adjustment algorithm is used to simultaneously solve the 3D geometry of the scene, the different camera positions, and the camera parameters [74]. This step's output is a sparse point cloud generated in a relative 'image-space' coordinate system. The number of overlapping images that result after alignment is not constant throughout the area because, near the edges, there are fewer overlapping images compared with in the central area ( Figure 5). This misalignment causes the measurements made in these areas to be less accurate than those made in the central areas; therefore, a wider area must be covered compared with the actual area of interest. This misalignment is shown in the blurring and overlapping edges of the houses shown in the upper picture of Figure 3. Subsequently, the GCPs coordinates are imported and are manually identified in the images. Currently, it is also possible to use automatic identification of normal and code targets. Code targets are not widely used in DTM generation, because the targets must be very large to recognize the pattern (e.g., metashape rounded coded targets, Pix4D QR codes). The GCPs coordinates are used to transform SfM image-space coordinates into an absolute coordinate system [6].
Later, multi-view stereo image matching algorithms are applied to increase sparse point cloud densities and generate a dense 3D point cloud ( Figure 6). Generally, different cloud quality parameters are available in photogrammetry software to build a dense cloud. This parameter affects the final DEM accuracy-e.g., [65]-and the resolutione.g., [23]. The lower the quality, the lower the spatial resolution and accuracy of the DEM. Therefore, if high quality and accuracy are required, high quality input is recommended. However, this requires more processing time.
High densities (points/m 2 ) in a dense point cloud can be obtained with UAV photogrammetry. The type of platform and camera, the flight planning parameters, and the quality of image processing influences this density of points. These point densities may be similar or lower to those generated by TLS. However, for many applications, the slightly lower point densities generated by UAV photogrammetry may outweigh the tremendous cost of TLS systems [78]. The point densities generated with UAV photogrammetry could hardly be achieved at the same time with a traditional ground survey using a TS-e.g., [18]. Subsequently, the GCPs coordinates are imported and are manually identified in the images. Currently, it is also possible to use automatic identification of normal and code targets. Code targets are not widely used in DTM generation, because the targets must be very large to recognize the pattern (e.g., metashape rounded coded targets, Pix4D QR codes). The GCPs coordinates are used to transform SfM image-space coordinates into an absolute coordinate system [6].
Later, multi-view stereo image matching algorithms are applied to increase sparse point cloud densities and generate a dense 3D point cloud ( Figure 6). Generally, different cloud quality parameters are available in photogrammetry software to build a dense cloud. This parameter affects the final DEM accuracy-e.g., [65]-and the resolution-e.g., [23]. The lower the quality, the lower the spatial resolution and accuracy of the DEM. Therefore, if high quality and accuracy are required, high quality input is recommended. However, this requires more processing time.

Ground Filtering and Generation of the DTM
DTM generation is essential in many applications that recreate the shape of the land surface once the external elements are removed, such as vegetation and buildings. There- High densities (points/m 2 ) in a dense point cloud can be obtained with UAV photogrammetry. The type of platform and camera, the flight planning parameters, and the quality of image processing influences this density of points. These point densities may be similar or lower to those generated by TLS. However, for many applications, the slightly lower point densities generated by UAV photogrammetry may outweigh the tremendous cost of TLS systems [78]. The point densities generated with UAV photogrammetry could hardly be achieved at the same time with a traditional ground survey using a TS-e.g., [18].

Ground Filtering and Generation of the DTM
DTM generation is essential in many applications that recreate the shape of the land surface once the external elements are removed, such as vegetation and buildings. Therefore, to derive the DTM, point clouds from the digital surface model (DSM) should be filtered to remove non-ground points, which is called ground filtering [5]. Ground filtering is a critical step in the restitution process for an accurate representation of the land surface topographic features and, in commercial software, is becoming a standard function [79].
Ground filtering is performed after a dense 3D point cloud has been generated and the points are classified into ground points and points belonging to above ground objects ( Figure 7). After that, the DTM ( Figure 8) is generated by interpolating the ground points belonging to the bare earth surface. After the dense point cloud classification, noise points can be found, and they must be manually removed. Typically, the noise points are much higher or lower than expected and do not represent any actual ground features.

Ground Filtering and Generation of the DTM
DTM generation is essential in many applications that recreate the shape of the land surface once the external elements are removed, such as vegetation and buildings. Therefore, to derive the DTM, point clouds from the digital surface model (DSM) should be filtered to remove non-ground points, which is called ground filtering [5]. Ground filtering is a critical step in the restitution process for an accurate representation of the land surface topographic features and, in commercial software, is becoming a standard function [79].
Ground filtering is performed after a dense 3D point cloud has been generated and the points are classified into ground points and points belonging to above ground objects ( Figure 7). After that, the DTM ( Figure 8) is generated by interpolating the ground points belonging to the bare earth surface. After the dense point cloud classification, noise points can be found, and they must be manually removed. Typically, the noise points are much higher or lower than expected and do not represent any actual ground features.  Many commercial and non-commercial software programs have a tool to classify the dense point cloud and perform ground filtering. For example, Agisoft PhotoScan Professional performs ground filtering using the adaptive triangulated irregular network algorithm, and Pix4D software utilizes a variational raster-based approach. However, both can induce errors in the DTM by misclassifying the ground cover vegetation [80] or confusing soil surface for an object surface. In general, filtering approaches tend to commit more errors in terrains with many aboveground objects; therefore, ground filtering must be monitored and often corrected manually. Many commercial and non-commercial software programs have a tool to classify the dense point cloud and perform ground filtering. For example, Agisoft PhotoScan Professional performs ground filtering using the adaptive triangulated irregular network algorithm, and Pix4D software utilizes a variational raster-based approach. However, both can induce errors in the DTM by misclassifying the ground cover vegetation [80] or confusing soil surface for an object surface. In general, filtering approaches tend to commit more errors in terrains with many aboveground objects; therefore, ground filtering must be monitored and often corrected manually.
It has been reported that cloth simulation filtering is one of the most accurate algorithms to automate ground filtering on 3D point clouds obtained from photogrammetry [1]. The efficiency of this and other existing algorithms has been improved, and new algorithms have been proposed that provide more reliable and accurate results.
With technological and knowledge advancement, the efficiencies of these algorithms are expected to continue to be improved so that higher quality and more accurate DTMs can be obtained automatically to reduce the time consumed in monitoring and correcting ground filtering. Ground filtering allows for DTM generation to be done automatically and makes photogrammetry an alternative to avoid high costs when using technologies, such as LiDAR, in specific surveying applications [81].
UAV photogrammetry can also produce georeferenced orthomosaics, where terrain details can be observed. These orthomosaics can provide additional information to the topographic survey. An orthomosaic offers several advantages in topographic surveys that have not been analyzed in detail.

Geomorphology and Land Use/Cover
Topography and land-use (land-cover) patterns are the main characteristics of the physical environment that define the vertical accuracy and quality of a DTM. For DTM generation, UAV photogrammetry has competitive advantages in survey areas of bare lands or those with isolated or sparse vegetation, in projects of quantification of fill volumes and excavation of earthworks, in estimating the ground slope or monitoring elevation changes, in local area applications, and especially if repetitive data collection is needed [11]. However, the presence of vegetation can decrease the vertical accuracy and quality of the DTM.
This situation can be easily explained by the passive nature of the optical sensor in images that cannot penetrate the vegetation. Thus, the vegetation's impact on the created model cannot be removed even with ground filtering algorithms. This consideration is important, particularly in areas with complex morphology (Figure 9) where the resulting It has been reported that cloth simulation filtering is one of the most accurate algorithms to automate ground filtering on 3D point clouds obtained from photogrammetry [1]. The efficiency of this and other existing algorithms has been improved, and new algorithms have been proposed that provide more reliable and accurate results.
With technological and knowledge advancement, the efficiencies of these algorithms are expected to continue to be improved so that higher quality and more accurate DTMs can be obtained automatically to reduce the time consumed in monitoring and correcting ground filtering. Ground filtering allows for DTM generation to be done automatically and makes photogrammetry an alternative to avoid high costs when using technologies, such as LiDAR, in specific surveying applications [81].
UAV photogrammetry can also produce georeferenced orthomosaics, where terrain details can be observed. These orthomosaics can provide additional information to the topographic survey. An orthomosaic offers several advantages in topographic surveys that have not been analyzed in detail.

Geomorphology and Land Use/Cover
Topography and land-use (land-cover) patterns are the main characteristics of the physical environment that define the vertical accuracy and quality of a DTM. For DTM generation, UAV photogrammetry has competitive advantages in survey areas of bare lands or those with isolated or sparse vegetation, in projects of quantification of fill volumes and excavation of earthworks, in estimating the ground slope or monitoring elevation changes, in local area applications, and especially if repetitive data collection is needed [11]. However, the presence of vegetation can decrease the vertical accuracy and quality of the DTM.
This situation can be easily explained by the passive nature of the optical sensor in images that cannot penetrate the vegetation. Thus, the vegetation's impact on the created model cannot be removed even with ground filtering algorithms. This consideration is important, particularly in areas with complex morphology (Figure 9) where the resulting point cloud will have very few ground points under the vegetation, and a high-quality DTM cannot be generated. Salach et al. [79] found a gradual increase in the error of the DTM, observing a decrease in the vertical accuracy of 0.10 m for every 20 cm of vegetation height. On surfaces with complex vegetation, it may not be possible to obtain ground points that allow for ground point triangulation to generate a suitable DTM. point cloud will have very few ground points under the vegetation, and a high-quality DTM cannot be generated. Salach et al. [79] found a gradual increase in the error of the DTM, observing a decrease in the vertical accuracy of 0.10 m for every 20 cm of vegetation height. On surfaces with complex vegetation, it may not be possible to obtain ground points that allow for ground point triangulation to generate a suitable DTM. In addition, it is noted that the photogrammetric method did not perform properly in areas of homogenous texture, resulting in voids, artifacts, or sparse areas in the point cloud [2]. Elevation errors may also arise on other types of surfaces; for example, on land In addition, it is noted that the photogrammetric method did not perform properly in areas of homogenous texture, resulting in voids, artifacts, or sparse areas in the point cloud [2]. Elevation errors may also arise on other types of surfaces; for example, on land where there are buildings closer to hills. In these areas, elevation at the house's base is interpolated with the hills, causing the intermediate pixels to be falsely assigned a higher elevation value. This situation is mainly due to the error associated with the DTM generation.
There are increasing numbers of examples with different methodologies to show that bathymetry can be successfully extracted from aerial images regarding bodies of water, channels, or rivers. These methodologies are applicable under certain conditions, such as for clear water or shallow water bodies. Westaway et al. (2001) [82] proposed a method to obtain bathymetry in clear water using aerial images. This methodology has been adopted with UAV images in different works-e.g., [83], although larger errors have been observed as increases depth. However, in the future, UAV photogrammetry may be a viable option for bathymetric LiDAR, whose costs are still high and where the resolution is relatively thick.

Accuracy Assessment
The quality and accuracy of the DTM results from many variables that can be grouped into four categories. The first category is related to the size of the area and its morphology [45], the types of ground coverage [79], lighting conditions (e.g., cloudy), and the color contrast of the objects [84]. The second category is related to UAV data collection systems and their characteristics, the camera and its calibration [36], and the type of platform (multicopter or fixed wing) that can be a platform with a survey-grade GNSS/RTK receiver [29].
The data acquisition and flight parameters can be grouped into another category, including the flight altitude [47] and its configuration [42,43], image overlap [51,52], the UAV flight speed [58], the flight path pattern (single or double grid) [60,61], and the acquisition of images from the nadir or oblique [14,62,63], in addition to the number of GCPs and its distribution [30,69]. The last category is related to SfM approaches and the algorithms to automate ground filtering from the 3D point cloud [1].
Evaluating the accuracy of a 3D point cloud can be done in three different ways, generally, the data are compared to a more accurate independent source. The first involves analyzing the residuals from the bundle adjustment once the 3D model is rotated and scaled. Another method is to compare the coordinates of the 3D model with CPs. A further method is by analyzing the residuals of the 3D model compared to a reference surface that can be obtained using another technique (e.g., TSL).
In the first case, as this method does not require nor use independent measurements, the measure should be analyzed in terms of internal precision rather than accuracy [30]. The third case would be the most expensive and can be used to compare the techniques in a certain application-e.g., [21,59]. The second case is the most used and is the one that will be used in this section; CPs must be different from GCPs, since the 3D model adapts to GCPs and, consequently, the lowest residuals will always be achieved at these points-e.g., [46].
In UAV photogrammetry, the horizontal accuracy is widely recognized to be slightly better than the vertical accuracy, except for extreme topography in a near-vertical cutslope-e.g., [12]. Various studies observed that the accuracy, measured in GSD values, was lower in flat surfaces compared with in complex topography.
For flat terrain, a horizontal RMSE between 1 × GSD to 3 × GSD and a vertical RMSE between 1 × GSD to 4.5 × GSD have been reported in various studies (Table 6). For complex topography, a horizontal RMSE between 1 × GSD to 7 × GSD and a vertical RMSE between 1.5 × GSD to 5 × GSD have been reported in various studies (Table 7). In Tables 5 and 6, the RMSE is indicated as a multiple of GSD, that is, these last three columns represent the accuracy achieved in relation to GSD; thus, it can be more useful to compare studies with different GSD.  The geometric accuracy in DEMs derived from UAV photogrammetry and evaluated in CPs is commonly related to the RMSE values (Equations (4)- (7)); theoretically, the lower the RMSE value is, the more accurate the DEM. However, in different studies, other accuracy indicators have been used, such as the standard deviation (e.g., [48,67]), mean error (e.g., MA, [47,52]), mean absolute error, or linear regression.
James et al. [85] indicated that the error's spatial variability must be evaluated when using the RMSE or when the systematic error and random error cannot be identified and cannot be adequately managed. Therefore, the authors recommended including error metrics that describe the bias or accuracy (e.g., the mean error and the difference between the average of measurements and the true value) and those that describe precision (e.g., the standard deviation of error).
where RMSE x , RMSE y , and RMSE z are the root-mean-square error in x, y and z, respectively; RMSE r is the horizontal root-mean-square error; xc i , yc i , and zc i are the coordinates of the ith CP in the dataset; xv i , yv i , and zv i are the coordinates of the ith CP in the independent source of higher accuracy; n is the number of check points tested; and i is an integer ranging from 1 to n. In this sense, to evaluate the DEM accuracy, various accuracy assessment methodologies have been used [27]. ASPRS standard [86] has reached a wide diffusion and acceptance and has been used in various UAV photogrammetry studies [9,32]. ASPRS is also one of the most recent. This standard defines horizontal accuracy classes (Equation (8)) in terms of their RMSE x and RMSE y values, and the vertical accuracy is computed using RMSE z statistics in non-vegetated terrain and 95th percentile statistics in vegetated terrain.
The accuracy is given at a 95% confidence level and it is assumed that the dataset errors are normally distributed and that any significant systematic errors or biases have been removed. This accuracy means that 95% of the positions in the dataset will have an error to the true ground position that is equal to or smaller than the reported accuracy value, and 66.7% of the data will have the maximum errors of the RMSE. Corresponding estimates of accuracy at the 95% confidence level values are computed using NSSDA methodologies (Equations (8)-(10)).
Accuracy r = 1.7308 × (RMSE r ) where Accuracy r is the horizontal accuracy at the 95% confidence level; Accuracy z is the vertical accuracy at the 95% confidence level; NVA means non-vegetated terrain; and VVA means vegetated terrain (VVA).

Conclusions
UAV photogrammetry is an appealing method to generate DTMs due to the less stringent requirements regarding the image acquisition geometry and the high level of automation of the geometric solution and camera calibration. UAV photogrammetry allows for obtaining DTMs with high accuracy and spatial resolution at low cost.
The main conclusions and recommendations derived from this work are mentioned below.

UAV Data Collection Systems
(a) UAV Platform: Commonly, the platform is the one that is acquired first. However, it is advisable to choose a platform based on the desired application. The type of platform has no influence on the DEM accuracy but does influence the point cloud quality. To select the type of platform, the kind of terrain, accessibility to the site, and weather conditions, among other influences, must be considered.
(b) Camera calibration: it is recommended to use camera self-calibration and follow the specifications described by Luhmann et al. [36] to estimate the calibration parameters more accurately.

Flight Planning and Image Acquisition
(a) Flight altitude: several studies indicated that both low and high flight altitudes affected the accuracy and quality of the DEM. According to the works cited in the document section, the optimal flight altitude should be between 70 and 150 m. If a smaller GSD is desired, to obtain the highest DEM accuracy, more GCPs may be added or vertical and oblique images may be combined to counteract the doming effect. In addition, it is recommended that the flight altitude must adapt to the ground height in each flight line instead of maintaining a constant height above the MSL.
(b) Image overlap: for non-metric RGB, digital cameras are recommended to use front overlaps between 70% and 90% and side overlaps of 60% to 80%. The lower the AGL, the closer the overlap should be to the upper limit.
(c) Flight speed: this variable influences the quality of the captured images. Therefore, it is necessary to estimate the speed base with the camera's configuration and the maximum tolerable motion blur, as presented in Equation (3).
(d) Orientation of the flight lines and camera configuration: the use of only vertical images is not recommended. The flight lines should not be planned only as parallel flight lines (patterns, such as back-and-forth and spiral); rather, other flight patterns should be added combined with vertical images with oblique 15-35 • tilt angles. The orientation must be based on the terrain morphology. A combination of vertical and oblique images improves the accuracy in the DTM.
(e) Georeferencing: for flat terrain with a surface area of less than 50 ha, one GCP can be used for every 3 hectares. The minimum number that should be used for a particular surface is five GCPs. For complex topography or efficiency reasons, it is recommended to use two GCPs per 100 images. The GCPs must be distributed in a stratified manner both at the edge and in the center part of the block with a separation of 3 to 4 ground base units. If RTK UAV platforms are used, it is necessary to add a minimum number of GCPs (at least one) or to combine vertical and oblique images to obtain accuracies similar to georeferenced DTMs only with GCPs.
(f) CPs: CPs should be at least three times more accurate than the required DTM accuracy [86]. At the least 25 CPs must be established, more densely, and close to essential features.

Photogrammetric DTM Generation
(a) Software: researchers observed that photogrammetric software did not influence the DEM accuracy. Therefore, the software should be selected based on its cost and the user's skills.
(b) DTM generation: this is also recommended to perform ground filtering automatically. When the ground filtering is done automatically, it must be monitored and often manually corrected.

Geomorphology and Land Use/Cover
Using UAV photogrammetry, it is not possible to obtain DTMs from all types of surfaces. Ground points must be observed in the point cloud for triangulation and for generation of the DTM. Vegetation and water are the main limitations.

Accuracy Assessment
Generally, a vertical RMSE in the range of one to five GSDs was reported in different studies. Estimation of the accuracy only in terms of the RMSE is not recommended, as the spatial variation cannot be observed. In any case, the ASPRS standard could also be used. Governmental agencies can establish accuracy limits for their product specifications and applications and contracting purposes.
According to what was previously expressed, UAVs complement existing survey methodologies, since several limitations appear with the exclusive use of a UAV in DTM generation. Despite these limitations, UAV photogrammetry has great potential in a wide range of application areas.