Quality Assessment of Photogrammetric Methods— A Workﬂow for Reproducible UAS Orthomosaics

: Unmanned aerial systems (UAS) are cost-e ﬀ ective, ﬂexible and o ﬀ er a wide range of applications. If equipped with optical sensors, orthophotos with very high spatial resolution can be retrieved using photogrammetric processing. The use of these images in multi-temporal analysis and the combination with spatial data imposes high demands on their spatial accuracy. This georeferencing accuracy of UAS orthomosaics is generally expressed as the checkpoint error. However, the checkpoint error alone gives no information about the reproducibility of the photogrammetrical compilation of orthomosaics. This study optimizes the geolocation of UAS orthomosaics time series and evaluates their reproducibility. A correlation analysis of repeatedly computed orthomosaics with identical parameters revealed a reproducibility of 99% in a grassland and 75% in a forest area. Between time steps, the corresponding positional errors of digitized objects lie between 0.07 m in the grassland and 0.3 m in the forest canopy. The novel methods were integrated into a processing workﬂow to enhance the traceability and increase the quality of UAS remote sensing.


Introduction
Unmanned aerial systems (UAS) are widely used in environmental research. Applications encompass the retrieval of crop yield [1] or drought stress [2] in agricultural areas or the mapping of plant species [3][4][5], biomass [6,7] or forest structure [8][9][10] in nature conservation tasks. Today, UAS allow an extensive spatial coverage with high resolution that provides detailed observations on the individual plant level, e.g., for the detection of pest infections in trees [11] or rotten stumps [12]. The flexibility of UAS is also beneficial for multi-temporal observations since flights can be scheduled on short notice based on specific events like bud burst or local weather conditions. Therefore, UAS are regarded as a key component for bridging the scales between space-borne remote sensing systems and in-situ measurements in environmental monitoring systems [13].
on an iteration of point cloud filters. The reproducibility of orthomosaics is quantified by the repeated processing of the same scene and a pixel-wise correlation analysis between the resulting orthophoto mosaics. We illustrate the importance of both methods using different orthorectification surfaces and two time series in a grassland and a forest area, respectively. To foster an error and reproducibility optimized orthomosaic processing, we incorporate the new methods into an impoved UAS workflow.

Multi-Temporal Flights
Two multi-temporal UAV-based image dataset were acquired as a test sample for this study ( Table 1). The first dataset is a series of six consecutive flights over a small temperate forest patch (Wolfskaute, Hesse, Germany). The surveyed area covers 7 ha of a forested hill with an adjacent meadow and ranges from 283 m to 320 m a.s.l. (above sea level). with a canopy height of up to 37 m (Figure 1a). The six flights were performed on 2020-07-07 between 11:00 and 14:00 CEST using a 3DR Solo Quadrocopter (3D Robotics, Inc., Berkeley CA, USA) and a GoPro Hero 7 camera (GoPro Inc., San Mateo, CA, USA; Appendix B, Table A1). The flight plan was made with Qgroundcontrol and refined with a LiDAR derived digital surface model (DSM, provided by the Hessian Agency for Nature Conservation, Environment and Geology (HLNUG)) with the R-package uavRmp to achieve a uniform altitude of 50 m above the forest canopy (see Appendix A). For georeferencing and checkpoint error calculation, 13 ground control points (GCPs) were surveyed with the Real Time Kinematic (RTK) GNSS (Global Navigation Satellite System) device Geomax Zenith 35 (GeoMax AG, Widnau, Switzerland). The RTK GNSS measurements had an error of between 0.9 and 1.6 cm in the horizontal direction and between 1.9 and 3.7 cm in the vertical direction. Eight GCPs were used as controlpoints and five served as independent checkpoints to evaluate the georeferencing error ( Figure 1a).
The second dataset is an inter-annual time series of a grassland area (La Bertolina, Eastern Pyrenees, Spain). Terrain altitudes range from 1237 m to 1328 m a.s.l. The flights were performed in spring or early summer of 2013, 2015 and 2017 using an octocopter with a Pixhawk controller. Cameras and flight plans in a fixed altitude differ between the dates (Table 1), detailed camera settings in Appendix B, Table A1). The flights took place in the morning with sub-optimal illumination angles below 35 degrees [21] and partly scattered light conditions due to the presence of clouds. Five to eight GCPs were measured in each year with a conventional GPS device without RTK, from which three were used as checkpoints (Figure 1b).
Both datasets were used to empirically determine the georeferencing accuracy and reproducibility of the photogrammetrically retrieved orthomsoaics. For a better understanding of the newly introduced approaches, the following chapters first outline the general UAS image processing workflow.

Image Georeferencing
Very high resolution orthomosaics such as those resulting from UAV flights require precise positioning to avoid the introduction of complex errors in the image processing [33]. The standard GNSS receivers that are built in cameras do not provide sufficient accuracy. There are two alternative  Table 1. Overview of the flight missions used to acquire the two test datasets in forest and grassland environments. The cameras were triggered by time interval. In the forest flights, altitude refers to a uniform height above the canopy. In the grassland flights, altitude referes to a fixed relative height above the take-off point. Overlap referes to both forward (F) and side (S) overlap.

Image Georeferencing
Very high resolution orthomosaics such as those resulting from UAV flights require precise positioning to avoid the introduction of complex errors in the image processing [33]. The standard GNSS receivers that are built in cameras do not provide sufficient accuracy. There are two alternative strategies for georeferencing the UAS products: direct georeferencing of the images with a RTK on the UAV or the use of GCPs. Direct georeferencing requires the accurate time synchronization between the RTK device and the camera, which has been reported as a major source of error [33][34][35].
The use of GCP implies that the study area is accessible in order to install visible ground markers before the flight and precisely measure their position. Ideally, the GCP should be equally distributed over the study area to avoid distortions during processing [33]. During the orthomosaic processing, the ground markers need to be interactively identified in the images. Despite these drawbacks, georeferencing through GCP with general-purpose GNSS receiving systems-that are nowadays standard equipment for surveying-is still far more widespread and potentially more cost-effective than the direct georeferencing with RTK [34].
In any case, GCPs are also required for the independent validation of the referencing accuracy during the processing [24] and therefore essential for a proper accuracy assessment. The geolocation accuracy is usually given as the checkpoint root mean squared error (checkpoint error, Equation (1)) that quantifies the distance between the position of the measured GCP (XYZ gcp ) and the estimated positions of these coordinates in the photogrammetric processing (XYZ est ). It can be calculated for each direction individually.

Photogrammetric Processing
The Metashape software (Agisoft LLC, St. Petersburg, Russia; formerly known as Photoscan) is widely used for UAS image processing. The standard photogrammetric workflow includes the image alignment, the generation of a digital surface model (called Digital Elevation Model in Metashape) and the orthorectification and mosaicing ( Figure 2). The image alignment starts with the automatic identification Remote Sens. 2020, 12, 3831 5 of 18 of distinct features in the individual images. This process is enhanced and requires less computation time if the individual images already contain GNSS information. Those features which appear in more than one image, the so called tie points, are matched and projected in a 3D space, forming the sparse cloud that is georeferenced using the surveyed GCPs. The georeferenced sparse cloud is subsequently used to compute a digital surface model, either through a dense pointcloud or a mesh interpolation of the sparse cloud (Figure 2a,b). The surface model is finally used for rectifying the georeferenced images.
For each processing step, a multitude of parameters and options are available that affect the results in terms of georeferencing accuracy and orthomosaic quality. While Metashape offers default values for these parameters, the methods described below aim to optimize and alter the standard workflow to obtain high quality and reproducible orthomosaics.  For the reproducibility analysis (c) the whole photogrammetric process is repeated x times, leading to x orthomosaics from the same image source. Pixel-wise correlation analysis between pairs of orthomosaics leads to n correlation layers and n binary layers based on a correlation coefficient threshold of > 0.95. The final reproducibility layer is the sum of all binary layers.

Orthomosaic Reproducibility
Since the orthomosaics are the result of complex photogrammetric methods, its reproducibility has to be assessed. In this context, reproducibility is a measure of how identical individual orthomosaics are, if they are computed from the same image source and with identical photogrammetric processing parameters. This way, the reproducibility of the photogrammetric process itself is evaluated without the influence of changes in the surveyed environment. For this purpose, a set amount of orthomosaics is computed with identical settings (Figure 2). To quantify the reproducibility, the pixel-wise correlation coefficient of the RGB values is calculated between each pair of the computations (raster R package, corLocal function). Pixels with a correlation coefficient of 0.95 or higher are considered identical between two orthomosaics. This leads to a binary layer for all pair-wise correlations marking reproducible and non-reproducible pixels. By summing up the binary layers, regions of high and low reproducibility can be identified (Figure 2c). High values then denote a high level of reproducibility of a pixel.
The more orthomosaics are computed, the more correlation layers can be calculated (Equation 2) and the more likely a layer is to receive a non-correlating pair of pixels. Therefore, a preliminary test with an arbitrarily high number of 25 orthomosaics (x = 25) was done, which leads to n = 300 For the reproducibility analysis (c) the whole photogrammetric process is repeated x times, leading to x orthomosaics from the same image source. Pixel-wise correlation analysis between pairs of orthomosaics leads to n correlation layers and n binary layers based on a correlation coefficient threshold of >0.95. The final reproducibility layer is the sum of all binary layers.

Optimizing the Georeferencing
Each point in the sparse cloud has four accuracy attributes: the reconstruction accuracy (RA), the reprojection error (RE), the projection accuracy (PA) and the image count [35]. In particular, the RE is suggested as the quality measure of tie points [20,36]. It is the deviation of the positions of identified features in the original image from positions of the same features in the calculated 3D space. The removal of points with a high error and the subsequent optimization of the camera positions can improve the georeferencing. However, by removing too many points in the individual sparse clouds, images no longer align and the checkpoint error increases. An iterative approach is used to find the optimal RE threshold for the dataset by filtering the sparse cloud using different RE threshold values in order to minimize the checkpoint error ( Figure 2). This method was applied to each flight of the two multi-temporal datasets.
The initial pointclouds are the direct results from the feature identification and matching algorithms from Metashape. In order to account for the inherited randomness of these processes and the slight differences of the checkpoint error due to the manual alignment of the GCP, the optimization was repeated five times and the standard deviation of the checkpoint error was calculated.

Orthomosaic Reproducibility
Since the orthomosaics are the result of complex photogrammetric methods, its reproducibility has to be assessed. In this context, reproducibility is a measure of how identical individual orthomosaics are, if they are computed from the same image source and with identical photogrammetric processing parameters. This way, the reproducibility of the photogrammetric process itself is evaluated without the influence of changes in the surveyed environment. For this purpose, a set amount of orthomosaics is computed with identical settings (Figure 2). To quantify the reproducibility, the pixel-wise correlation coefficient of the RGB values is calculated between each pair of the computations (raster R package, corLocal function). Pixels with a correlation coefficient of 0.95 or higher are considered identical between two orthomosaics. This leads to a binary layer for all pair-wise correlations marking reproducible and non-reproducible pixels. By summing up the binary layers, regions of high and low reproducibility can be identified (Figure 2c). High values then denote a high level of reproducibility of a pixel.
The more orthomosaics are computed, the more correlation layers can be calculated (Equation (2)) and the more likely a layer is to receive a non-correlating pair of pixels. Therefore, a preliminary test with an arbitrarily high number of 25 orthomosaics (x = 25) was done, which leads to n = 300 correlation layers.
In practice, computing 25 orthomosaics is, in most cases, unreasonable regarding the computation time and processing resources. Therefore, the reproducibility analysis was also done with only 5 identical orthomosaic computations (i.e., 10 pairwise correlations). The comparison of both reproducibility layers revealed that summing up 10 correlation layers (x = 5) is sufficient to identify most pixels which are also denoted as non-reproducible when using 300 binary layers. The analysis of the time series and the full forest set were therefore done with only 5 identical orthomosaic computations.
In addition, the edges of the orthomosaic are heavily distorted and have a lower positional accuracy due to less image overlap [37]. Therefore, the orthomosaic should be cropped to the central area with a sufficient overlap. In the R package uavRmp provided with this study, this crop mask is automatically generated from spatial polygons defined by the seamlines (i.e., the outline of the individual image parts) of the mosaic. The outermost polygons are identified using a concave hull of the seamlines and are discarded from the orthomosaic.

Assessing the Orthorectification Surface
The standard workflow in Metashape suggests a DSM created from a dense pointcloud as the orthorectification surface (Figure 2a) [16]. In vegetation free areas, this DSM is mostly equivalent to a digital elevation model (DEM) [39] or digital terrain model (DTM) and therefore suitable for the creation of orthomosaics. In areas with vegetation, the DEM requires the classification of ground points in the dense pointcloud which is currently not viable in Metashape for structurally rich environments like forests or grasslands in the phases of maturation and flowering. Alternatively, a 2.5D mesh can be created from the sparse cloud on which the images are projected [40]. By smoothing the mesh to eliminate sharp edges, the surface can be regarded as an approximation of a DEM. This approach requires far less computational ressources since the creation of a densecloud is skipped. It is therefore more suitable for low-budget UAS setups. To demonstrate and validate its usage, the mesh surface was compared to the DSM for one of the forest scenes with respect to the reproducibility of the derived orthomosaics using the pixel-wise correlation method described above.

Time Series Accuracy
To assess the overall reproducibility of time series, reproducibility masks have been computed for each time step and overlaid to identify pixels that are reproducible over the multi-temporal data and suitable for time series analyses. To differentiate between positional errors from the photogrammetric processing and actual environmental changes between the time steps, identifiable objects and trees were digitized in each individual orthomosaic (7 geometries in the forest, 4 in the grassland). The positional shift of the bounding boxes for each digitized object was calculated between each time step. This provides a more critical assessment of time series than the individual checkpoint errors alone, since relative position differences and environmental changes between the time steps are taken into account.

Optimized Georeferencing Accuracy
To evaluate the optimized georeferencing approach, the sparse clouds were iteratively filtered with decreasing RE thresholds. The sparse clouds of the six forest flights originally included between 560,000 and 680,000 tie points (Figure 3b  In order to test the robustness of the method, the determined optimal RE threshold of 0.4 m was used to filter the sparse clouds of five identical computations of the six forest flights. The average controlpoint error in the horizontal direction was consistently below 0.02 m in all six flights and deviated less than 0.001 m in each of the five computations. The horizontal checkpoint error was between 0.02 m and 0.06 m over all six flights and deviated less than 0.01 m within the five computations. The error in the vertical direction (Z in Table 2) was up to five times higher; however, the reproducibility in each flight is still stable with a maximum deviation of 0.03 m over the five computations.
In the grassland area, the iterative point cloud filtering only marginally improved the checkpoint error since almost all tie points had already very low RE of less than 0.4 m. The final checkpoint errors for the years 2013, 2015 and 2017 were 0.29 m, 0.18 m and 0.07 m, respectively. These errors are up to 10 times higher than in the forest time series, which is mostly due to the use of a conventional GNSS In order to test the robustness of the method, the determined optimal RE threshold of 0.4 m was used to filter the sparse clouds of five identical computations of the six forest flights. The average controlpoint error in the horizontal direction was consistently below 0.02 m in all six flights and deviated less than 0.001 m in each of the five computations. The horizontal checkpoint error was between 0.02 m and 0.06 m over all six flights and deviated less than 0.01 m within the five computations. The error in the vertical direction (Z in Table 2) was up to five times higher; however, the reproducibility in each flight is still stable with a maximum deviation of 0.03 m over the five computations.
In the grassland area, the iterative point cloud filtering only marginally improved the checkpoint error since almost all tie points had already very low RE of less than 0.4 m. The final checkpoint errors for the years 2013, 2015 and 2017 were 0.29 m, 0.18 m and 0.07 m, respectively. These errors are up to 10 times higher than in the forest time series, which is mostly due to the use of a conventional GNSS measurements for the GCP in the grassland compared to the RTK GNSS measurement in the forest. Nevertheless, the five computations of the grassland time series led to very consistent checkpoint errors with standard deviations close to 0 ( Table 2).

Orthomosaic Reproducibility
To evaluate the reproducibility of orthomosaics, the images of the 4th forest flight were computed 25 times with identical photogrammetric parameters. The 300 pixel-wise correlation analysis between the 25 orthomosaics were performed within a testing area of 600 by 650 pixels showing the forest canopy ( Figure 4a). Pixels with correlation coefficients higher than 0.95 were considered reproducible between the orthomosaics. Highly reproducible pixels are characterized by consistently high correlation coefficients and therefore high values in the summed up layer shown in Figure 4b. Non-reproducible regions appear mainly in forest clearings or at dead trees. The actual canopy appears stable across multiple computations.
Using only five identical orthomosaics (i.e., 10 pairwise correlation analyses, Figure 4c) revealed the same patterns as the 300 correlation layers. Hence, only five repetitions are considered enough for subsequent reproducibility analyses.

Orthomosaic Reproducibility
To evaluate the reproducibility of orthomosaics, the images of the 4th forest flight were computed 25 times with identical photogrammetric parameters. The 300 pixel-wise correlation analysis between the 25 orthomosaics were performed within a testing area of 600 by 650 pixels showing the forest canopy (Figure 4a). Pixels with correlation coefficients higher than 0.95 were considered reproducible between the orthomosaics. Highly reproducible pixels are characterized by consistently high correlation coefficients and therefore high values in the summed up layer shown in Figure 4b. Non-reproducible regions appear mainly in forest clearings or at dead trees. The actual canopy appears stable across multiple computations.
Using only five identical orthomosaics (i.e., 10 pairwise correlation analyses, Figure 4c) revealed the same patterns as the 300 correlation layers. Hence, only five repetitions are considered enough for subsequent reproducibility analyses.

Comparison of Mesh and DSM Surface-Based Orthomosaics
To evaluate to which degree the reproducibility of orthomosaics depends on the use of the underlying mesh and DSM surface in the forest environment, both surfaces have been used in otherwise identical computation workflows. Using 5 identical orthomosaic computations, 82% and 85% of the pixels were considered reproducible using the DSM and mesh, respectively. All nonreproducible pixels were found in the forest clearing areas of the images. The surrounding meadow

Comparison of Mesh and DSM Surface-Based Orthomosaics
To evaluate to which degree the reproducibility of orthomosaics depends on the use of the underlying mesh and DSM surface in the forest environment, both surfaces have been used in otherwise identical computation workflows. Using 5 identical orthomosaic computations, 82% and 85% of the pixels were considered reproducible using the DSM and mesh, respectively. All non-reproducible pixels were found in the forest clearing areas of the images. The surrounding meadow did not differ between the orthomosaics ( Figure 5). When only the forested area is considered, the number of reproducible pixels decreased to 69% in the DSM and 74% in the mesh-based processing.
While being nearly identical in their amounts of reproducible pixels, there is still a large contrast between the orthomosaic reproducibility of the two surfaces. If a pixel in the DSM-based orthomosaics is non-reproducible between two images, there is a high probability that this pixel is non-reproducible in all images ( Figure 5). The mesh-based orthomosaics show significantly more pixels that are non-reproducible between only one or two different orthomosaics, but were stable between the other computations.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 19 did not differ between the orthomosaics ( Figure 5). When only the forested area is considered, the number of reproducible pixels decreased to 69% in the DSM and 74% in the mesh-based processing. While being nearly identical in their amounts of reproducible pixels, there is still a large contrast between the orthomosaic reproducibility of the two surfaces. If a pixel in the DSM-based orthomosaics is non-reproducible between two images, there is a high probability that this pixel is non-reproducible in all images ( Figure 5). The mesh-based orthomosaics show significantly more pixels that are non-reproducible between only one or two different orthomosaics, but were stable between the other computations.

Forest Time Series Reproducibility
The same canopy part of the orthomosaics as in Figure 4 was used for the assessment of the reproducibility along a time series. Figure 6 reveals that non-reproducible areas are mostly consistent between the flights. They concentrate around clearings and around the branches of dead crowns visible in Figure 4a. The actual forest canopy is reproducible. Flight conditions also seem to have an impact on the overall reproducibility. Forest 03 to 06 which were performed in cloudy conditions

Forest Time Series Reproducibility
The same canopy part of the orthomosaics as in Figure 4 was used for the assessment of the reproducibility along a time series. Figure 6 reveals that non-reproducible areas are mostly consistent between the flights. They concentrate around clearings and around the branches of dead crowns visible in Figure 4a. The actual forest canopy is reproducible. Flight conditions also seem to have an impact on the overall reproducibility. Forest 03 to 06 which were performed in cloudy conditions show less deviations between computations than Forest 01 and 02 where cloud-free conditions and low solar elevations are present.
Summing up all the correlation layers of the six flights (Figure 7) leads to a quality mask for the whole time series. This confirms that the canopy region is reproducible and stable even across the time series.

Grassland Time Series Reproducibility
In the grassland time series, the reproducibility of each orthomosaic was also tested with five identical computations. Only 1% of the pixels in the grassland area deviated between the computations of each year. In 2013 and 2015, the non-reproducible areas occur mainly in the area of a micrometeorological station in the middle of the meadow (Figure 8). In 2017 some singular pixels also deviated in the meadow areas.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 19 show less deviations between computations than Forest 01 and 02 where cloud-free conditions and low solar elevations are present. Summing up all the correlation layers of the six flights (Figure 7) leads to a quality mask for the whole time series. This confirms that the canopy region is reproducible and stable even across the time series.

Grassland Time Series Reproducibility
In the grassland time series, the reproducibility of each orthomosaic was also tested with five identical computations. Only 1% of the pixels in the grassland area deviated between the

Time Series Positional Accuracy
To further assess the validity of UAS time series, the positional shift between 7 digitized tree crowns in the forest and four visible objects in the grassland were calculated. Tree crowns moved by 0.3 m on average with a maximum shift of 0.75 m of one tree between forest flight 02 and 03. During the image acquisition of these two flights, the lighting conditions changed due to the presence of clouds and changing wind speeds. In Figure 9, the positional shift of 0.3 m to the left of the marked tree is visible as well as slight differences in the geometry of the crown due to different lighting conditions and wind.
The solar panel visible in Figure 6

Time Series Positional Accuracy
To further assess the validity of UAS time series, the positional shift between 7 digitized tree crowns in the forest and four visible objects in the grassland were calculated. Tree crowns moved by 0.3 m on average with a maximum shift of 0.75 m of one tree between forest flight 02 and 03. During the image acquisition of these two flights, the lighting conditions changed due to the presence of clouds and changing wind speeds. In Figure 9, the positional shift of 0.3 m to the left of the marked tree is visible as well as slight differences in the geometry of the crown due to different lighting conditions and wind.
The solar panel visible in Figure 6 is one of four objects which were digitized to measure the positional accuracy of the grassland time series. Between the individual time steps, on average, the polygons differed by 0.03 m in their position. The largest deviation occurred between the orthomosaics of 2013 and 2017 with a maximum shift of 0.07 m between one object. Hence, environmental changes in the grassland have less impact on the time series.
tree is visible as well as slight differences in the geometry of the crown due to different lighting conditions and wind.
The solar panel visible in Figure 6 is one of four objects which were digitized to measure the positional accuracy of the grassland time series. Between the individual time steps, on average, the polygons differed by 0.03 m in their position. The largest deviation occurred between the orthomosaics of 2013 and 2017 with a maximum shift of 0.07 m between one object. Hence, environmental changes in the grassland have less impact on the time series.

Discussion
The increasing use of UAS imagery in science and science-related services demands a operational processing and reliable validation techniques in the commonly used photogrammetric workflows. This study introduces two optimizations to the conventional photogrammetric workflow: (1) a new optimization for the georeferencing workflow and (2) a novel technique aiming to evaluate the repeatability of photogrammetrically retrieved orthomosaics. The application of these methods demonstrated the possibility to acquire accurately referenced UAS orthomosaic time series with low-cost UAVs and RGB cameras for both forest and grassland environments. The reproducibility of orthomosaics was highly dependent on the vegetation structure of the survey area.

Optimized Georeferencing Accuracy
The determination of optimal tie point filters leads to positional precisions of less than 6 cm in forested areas. Regarding the GSD of 2.58 cm/px the resulting orthomosaics have a positional error of up to three pixels. This error is stable over multiple computations and different sets of images from the six flights over the forest. This suggests that the iterative filtering approach leads to robust RE thresholds and only needs to be computed one time.
The difference of 0.04 m in the checkpoint error between the six flights could come from different GNSS satellite constellations or cloud conditions [41] over the three hours the flights took place, but, most likely, these small differences come from slight inaccuracies during the manual alignment of the GCP. This suggests that the operational workflow consistently leads to viable orthomosaics with resolutions of less than 10 cm, which is more than sufficient for detailed spatio-temporal structural analysis of forests [9,10]. In Belmonte et al. [8], a checkpoint error of 1.4 m and a GSD of 15 cm led to validated object-based analysis even in moderately dense canopies. The accuracy in the experimental forest areas even keep up with the checkpoint error in the grassland time series (between 0.04 m and 0.08 m). This also compares very well with other studies in structurally sparse landscapes where checkpoint errors tend to be very low [33,42].
The grassland time series further demonstrates that the proposed methods of optimization and validation work outside of the experimental setup. Differences in flight planning, low quality GNSS measurements at the GCP and the usage of different cameras still led to consistent and accurate orthomosaics. Hence, the provided workflow can be used as a fully operational method in grassland and agricultural contexts.

Pixel-Wise Reproducibility of Orthomosaics
The pixel-wise correlation of identically computed RGB orthomosaics leads to a quantitative measurement of reproducibility. This is a necessary addition to assess the photogrammetic processing of images, especially considering the "black box" nature of non-open-source software like Metashape. With the pixel-wise approach, deviations between computations are assigned to certain spatial regions of the orthomosaic. The mesh and DSM as orthorectification surfaces in the forest time series showed similar amounts of reproducible pixels (DSM: 69%, mesh: 74%). However, the mesh is considered superior since it leads to a better reproducibility in canopy areas. The calculation of the mesh is also less time consuming than the computation of the DSM. Both digital surfaces failed to reproduce fine structures like single tree branches or forest gaps. This can be problematic, since these structures are most likely the ones researchers aim to observe with UAS imagery [11,16,43].
The results also suggest that non-reproducibility can be tracked down to uncertainties in the initial step of the photogrammetric process, the feature identification and feature matching of the individual images [31]. These uncertainties increase with the presence of fine structures in the images since they are prone to move even under light wind conditions. It is therefore more likely that their position changes in consecutive images. In particular, Döpper et al. (2020) [44] recently demonstrated that acquiring UAV data for forest, grasslands and crop environments in low-light conditions such as low Solar Elevation Angle or high cloud cover causes problems in matching characteristics in the image alignment process.
Although this study declares these areas as not reproducible, the structures are still apparent in the orthomosaics. Image analysis methods (e.g., an object-based classification) of these areas might still lead to viable results and consistent geometries. This should be investigated in subsequent studies.

Time Series
The combination of multiple reproducibility layers enables the validation of UAS derived orthomosaic timeseries. The high reproducibility of multi-temporal grassland orthomosaics confirms the valid analysis of vegetation dynamics in grassland and agricultural studies. Forested area time series are also possible, however non-reproducible regions have to be considered.
The checkpoint error of each orthomosaic in the time series alone gives no insight into the positional relation between the individual time steps. Image acquisition with the UAV, analysis tools (processing software), field experiment designs and environmental conditions have a strong impact on the geometric accuracy of photogrammetric products [6,45,46]. Hence, it is essential to quantify the geometric accuracy on aerial imagery when combining UAS data from different flights, dates and sources [47]. We suggest the addition of geometry-based deviations from digitized objects. In case of the grassland time series, a maximum positional shift of 0.07 m between the time steps is tolerable for most use cases such as the modelling of the temporal dynamics of biophysics and biochemical variables of the meadow canopy or even analyze the variability in size and distribution of vegetation patterns (Lobo et al. in prep). This error also lies in the range of the individual checkpoint errors (0.04 to 0.08 m). In the forest time series, similar accuracies were achieved in the surrounding meadow areas. However, the canopy showed positional deviations of up to 0.5 m in digitized trees. A proportion of this error comes from the actual movement of the canopy due to wind and changes in the lighting conditions [44]. The non-reproducibility of some parts of the canopy, especially at forest clearings may also contribute to this error. We suggest object-based analysis instead of pixel-based approaches when high-resolution forest time series are regarded.

Improved UAS Workflow
The suggested methods of checkpoint error optimization and reproducibility validation complement the general UAS workflow. In order to make these methods more accessible to users, we provide a Python module-MetashapeTools (https://github.com/envima/MetashapeTools)-which utilizes the Metashape API for an improved photogrammetric workflow. The orthomosaic processing in form of a script-based workflow ensures the documented parameterization of all the modules in Agisoft Metashape. The workflow is therefore shareable and can be easily integrated into a version control system, making UAS research more transparent. Apart from the manual alignment of the GCP, the photogrammetric process is fully automated. The default parameters in the MetashapeTools are the results of the experimental forest flights and a starting point for a multitude of flight areas. The script-based framework provides flexibility to alter different parts of the workflow and, e.g., integrate alternative processing steps for time series like in Cook et al. [48].
In the future, the general workflow should utilize only open-source software. Currently, Agisoft Metashape is the de facto standard and the most promising software in affordable UAS image processing [3,13]. The development of open-source photogrammetry projects like OpenDroneMap are promising and will be integrated once they are fully operational. The transition from proprietary software towards open and transparent workflows is an ongoing trend worth supporting in spatial analyses [49]. For now, publications utilizing Metashape or other "black box" software should at least include the checkpoint error and the full parameterization of the processing modules. Ideally, the parameterization can be provided as a script, e.g., as a supplementary material or published in a repository. Although the computation of the reproducibility layer can be intensive, its inclusion in studies provide the necessary transparency about the quality and interpretation of the orthomosaics. The documented and evaluated orthomosaics are a big contribution to environmental mapping and monitoring system [50].

Conclusions
The rising popularity of UAS imagery in all fields of spatial research led to a variety of processing approaches. The supposedly ease of use and low cost of ready-to-fly UAS opened up some pitfalls in the image acquisition and processing which this study addressed. The evaluation of the orthomosaic accuracy aimed at the reproducibility of the final product. The presented optimization of the georeferencing accuracy based on the checkpoint error and the quantification of the orthomosaic reproducibility enhance the UAS workflow with the necessary quality assessment. This complements the standardized acquisition of high quality UAS time series.
In forest environments, there are still some shortcomings of UAS orthomosaic reproducibility that quantitative analyses need to consider. In grassland environments, these issues are marginal, which supports the validity of UAS in agricultural applications. The novel approaches of this study and their incorporation into a workflow are promising for validated and transparent UAS reserach.

Appendix A
Flight mission planning is the basis for all UAS derived orthomosaics and therefore crucial for high quality and reproducible image processing. The planning requires the consideration of hardware limitations like UAS speed or the image sampling rate of the camera as well as the aspired ground sampling distance. Further, individual images need to overlap sufficiently in order to process the orthomosaics.
The provided R-package uavRmp strives for the automated and reproducible creation of flight tracks. The package helps users by suggesting image sampling rates and UAS speed with the given camera parameters and the required overlap and GSD. Rectangular study areas can directly be planned in R. Furthermore, uavRmp provides a high resolution surface following mode if a digital elevation model is provided. This makes it possible to follow detailed structures like forest canopies and areas with steep terrain. The camera is also oriented in a fixed direction for the whole mission. The flight is automatically split into multiple MAVlink protocols according to a provided battery lifetime including a safety buffer for proper operations.