Refining the Joint 3D Processing of Terrestrial and UAV Images Using Quality Measures

The paper presents an efficient photogrammetric workflow to improve the 3D reconstruction of scenes surveyed by integrating terrestrial and Unmanned Aerial Vehicle (UAV) images. In the last years, the integration of this kind of images has shown clear advantages for the complete and detailed 3D representation of large and complex scenarios. Nevertheless, their photogrammetric integration often raises several issues in the image orientation and dense 3D reconstruction processes. Noisy and erroneous 3D reconstructions are the typical result of inaccurate orientation results. In this work, we propose an automatic filtering procedure which works at the sparse point cloud level and takes advantage of photogrammetric quality features. The filtering step removes low-quality 3D tie points before refining the image orientation in a new adjustment and generating the final dense point cloud. Our method generalizes to many datasets, as it employs statistical analyses of quality feature distributions to identify suitable filtering thresholds. Reported results show the effectiveness and reliability of the method verified using both internal and external quality checks, as well as visual qualitative comparisons. We made the filtering tool publicly available on GitHub.


Introduction
In the last years, Unmanned Aerial Vehicle (UAV) platforms [1][2][3][4] have become widely used for image or LiDAR (Light Detection and Ranging) data acquisitions and 3D reconstruction purposes in several applications [5][6][7][8][9][10][11]. Such platforms have brought undisputed advantages, in particular when combined with terrestrial images. The increasing number of applications where UAV-based images are combined with terrestrial acquisitions is related to the opportunities to achieve complete and detailed 3D results. Despite the promising results of this image fusion approach, several issues generally arise from the joint processing of such sets of data. Big perspective changes, different image scales and illumination conditions can, in fact, deeply affect the adjustment and orientation outcomes and, consequently, the successive photogrammetric products.
Nowadays, the automation of image-based techniques has simplified the procedures of 3D reconstruction, also allowing non-expert users to digitally reconstruct scenes and objects through photogrammetric/computer vision methods. In automatic processing pipelines, increasingly robust operators and algorithms have reduced the manual effort and the processing time required in the traditional photogrammetric procedures. At the same time, however, these solutions offer less control over the processing steps and the quality of the final products.
The quality of the generated 3D data strongly depends on the characteristics of the employed sensors, on the photogrammetric network design and on the image orientation results. In many

Image Network Configurations
The impressive technological advancements in terms of automated algorithms for image orientation (SfM) have led users to two facts: (i) being able to generate nice textured 3D results with minimum manual efforts and (ii) acquire many images, very often more than those strictly necessary, with an unfavourable geometry and not in the appropriate positions. However, the effectiveness of automated processing algorithms in identifying and matching correspondent features in overlapping images is often not sufficient for guaranteeing the achievement of the required accuracy and completeness of the 3D reconstruction. Image network design and its quality have been deeply investigated [29][30][31][32][33][34]. The relevance of suitable base-to-height ratios and camera parameters [15,29,35], advantages of bigger intersecting rays, as well as higher image redundancy [36,37], have been analyzed to identify the achievable accuracy with some standard and straightforward image network configurations. The quantitative and rough estimation of the expected quality for a designed camera network has been proposed in [29]: where the precision σ Z in Z direction (camera-to-object) in the case of orthogonal acquisitions is determined as a function of the distance Z to the object, the baseline B (i.e., distance among two cameras), the camera focal length c and the precision of the image measurements σ pξ (which depends on the accuracy of the feature extraction and matching). The role of the baseline is changed in the last years as, contrary to theory, short baselines are required by automatic image matching procedures, as big perspective changes can cause a decrease in the object precisions.
In the very common case of convergent images, the proposed Equation (1) can be generalized for all object coordinates as: where k is the number of images for each camera position (generally 1) and q is a design factor, indicating the quality of the imaging network and depending on the intersection angle between homologous rays (with optimal network 0.5 < q < 1.2). The homogeneity of the precision values is reached when intersection angles are near to 90 • . The proposed formulas for estimating and analyzing the quality of the camera network are, however, only applicable when a uniform and regular acquisition geometry is respected, a single sensor is used for the image capture and the environment can be controlled. When data are multi-sensor and multi-scale and scenes are quite complex, further investigations are needed, involving, e.g., metrics produced by the bundle adjustment process [12].
In the last years, quality assessments for different block configurations design have been partially explored for close-range terrestrial and UAV-based acquisitions [38]. Methodologies for checking the quality of the obtained results, deformation analyses and some recommendations with typical photogrammetric networks, especially employed in the cultural heritage field, have been presented in [14,39]. Results of different imaging geometries for large-scale acquisitions are compared in the close-range terrestrial scenario and when UAV and terrestrial images are processed. Reference [40] proposed a broad quality assessment in different acquisition scenarios, comparing different network configurations and processing conditions (e.g., weak image blocks, effects of self-calibrations, the influence of Ground Control Points (GCPs) and their distribution, number of extracted tie points and consequences of some image acquisition settings). In [41], the effects on the object points precision with complex network configurations and significant image-scale variation are deepened, analyzing the quality of the bundle adjustment results with irregularly distributed camera stations.

Point Cloud Filtering
Sensors limitations, weak acquisition geometries and issues in the processing steps can return noisy and incomplete 3D reconstructions. Noise and outliers are random errors in the measured data and points which behave differently from a set of normal data, respectively. These are common phenomena in both range-based and image-based data, and they can be solved or mitigated through a filtering step.
The solutions proposed in the last years can operate at different levels (i.e., sparse or dense point clouds or meshes). However, most of them are focused on specific methods for denoising meshes. Fewer methodologies have been instead implemented for point clouds, despite the obvious advantages of working on a lower data level [19] in terms of reduced computational efforts and improvement of the following reconstruction products. An extensive review of the developed algorithms for filtering point clouds has been proposed by [42].
Depending on the specific nature of 3D point clouds, many approaches are based on statistical methods [43]. These methods assume that data follow some statistical models for classifying points as outliers, defining probability distribution functions [44] or assigning specific weights in statistical denoising frameworks [45,46]. Global or local solutions proposed for outlier detection and removal are mainly based on a balance between the cleaning demand and the preservation of the features or/and shape. Most of the proposed filtering and smoothing methods analyze some local or global geometric properties computed on the point clouds [17,47]. Projection-based filtering techniques use several projection strategies on estimated local surfaces for identifying outliers. Many of these approaches are based on regression methods, such as the Moving Least Square (MLS) [48].
Other, more specific procedures have been developed for denoising image-based point clouds and improving the entire photogrammetric workflow. Following initial works based on efficient solutions for the extraction of reliable and accurate tie points [49][50][51], more recent studies focus on tie points, filtering for improving the orientation step. In the context of UAV-based datasets, [52] presented an outlier detection method based on the statistical distribution of the reprojection error, proposing a noise reduction procedure that considers the intersection angles among homologous rays. In [53,54] new strategies to assign quality measures to the extracted tie points were proposed in order to refine the camera parameters through a filtering step based on the acceptable reprojection error and the homogenous distribution of tie points. These methods offer a deeper knowledge of quality issues which could affect the following products, working at a lower level (i.e., on the first reconstruction

Work Premise and Motivation: Challenging Image Network Configurations
Before developing the filtering methodology (Section 3.3), some tests were conducted to recap the effects of different image networks (terrestrial with convergent images, terrestrial with almost parallel views, UAV or combined) on the photogrammetric processing and on the achievable accuracies (hereafter measured with the a-posteriori standard deviation σ computed within the bundle adjustment process). Figures 2 and 3 show the considered image network and error distributions, respectively. The errors are also more accurately summarized in Table 1.

Work Premise and Motivation: Challenging Image Network Configurations
Before developing the filtering methodology (Section 3.3), some tests were conducted to recap the effects of different image networks (terrestrial with convergent images, terrestrial with almost parallel views, UAV or combined) on the photogrammetric processing and on the achievable accuracies (hereafter measured with the a-posteriori standard deviation σ computed within the bundle adjustment process). Figures 2 and 3 show the considered image network and error distributions, respectively. The errors are also more accurately summarized in Table 1.   The first three cases are consistent with the expected results. Average σ values are generally higher in the camera depth directions. In the convergent geometry (network 1a), average σ values are higher in the x-y plane; with parallel views images (network 1b), precision is worse along the camera depth direction (y-axis); with UAV images (network 1c), the precision is worse in the y-z plane.  The first three cases are consistent with the expected results. Average σ values are generally higher in the camera depth directions. In the convergent geometry (network 1a), average σ values are higher in the x-y plane; with parallel views images (network 1b), precision is worse along the camera depth direction (y-axis); with UAV images (network 1c), the precision is worse in the y-z plane. The first three cases are consistent with the expected results. Average σ values are generally higher in the camera depth directions. In the convergent geometry (network 1a), average σ values are higher in the x-y plane; with parallel views images (network 1b), precision is worse along the camera depth direction (y-axis); with UAV images (network 1c), the precision is worse in the y-z plane. Furthermore, reconstructed 3D tie points show higher σ values when the sensor-to-object distance increases.
From the joint processing of terrestrial and UAV images (network 1d), a more homogeneous error distribution in the three directions can be noticed, as well a general worsening of most of the statistical values. This highlights that the combination of different scale and sensor data, which returns hybrid and complex imagery geometries, define challenging processing scenarios where outliers and wrong computed matches frequently and heavily affect the quality of the image orientation and 3D reconstruction. The effectiveness of the implemented filtering procedure is tested on these challenging data, and results are presented and discussed in Section 4.

Quality Parameters of 3D Tie Points
The quality of the camera orientation process can be derived from external and internal consistency checks [31]. While checking the results with external data provides a measure of possible block deformations, internal statistics are reliable indicators of the quality of the features extraction and matching steps, and, indirectly, of the image block strength.
In this work, some inner quality parameters of each 3D tie point [12,16,57] are computed with an in-house tool and used to remove bad-quality tie points. The considered parameters express the quality of the image network geometry, the correctness of the matching and adjustment step, as well as the reliability and precision of the reconstructed 3D points. We consider the following quality parameters of the sparse point cloud derived within the bundle adjustment procedure (

Filtering Technique
Once each 3D tie point is enriched with its quality parameters (Section 3.2), an aggregated quality score is computed as a linear aggregation of the different parameters (Section 3.3.2). The aggregated quality score of each 3D tie point is used as an indicator of the quality of the reconstructed point within the adjustment procedure. This quality measure is employed in the automatic filtering approach to discard low-quality points (Section 3.3.3) before running a new bundle adjustment to refine the orientation results.
contributing to the creation of a 3D point. Small intersection angles can negatively affect the adjustment procedure and reduce its reliability. (d) A-posteriori standard deviation of object coordinates (σ): from the covariance matrix of the least squares bundle adjustment, the standard deviations of all unknown parameters can be extracted. High standard deviation values can highlight 3D points with unsuitable object coordinates precision and problematic areas within the image network.

Filtering Technique
Once each 3D tie point is enriched with its quality parameters (Section 3.2), an aggregated quality score is computed as a linear aggregation of the different parameters (Section 3.3.2). The aggregated quality score of each 3D tie point is used as an indicator of the quality of the reconstructed point within the adjustment procedure. This quality measure is employed in the automatic filtering approach to discard low-quality points (Section 3.3.3) before running a new bundle adjustment to refine the orientation results.

Single-Parameter Filtering: Tests and Issues
Many filtering methods employ a single quality parameter to discard bad points. To highlight that quality features are correlated and should be concurrently considered, a dataset of 133 terrestrial and 87 UAV images (Section 4.1.1) is processed in order to derive camera parameters and a sparse 3D point cloud. Quality parameters (re-projection error, multiplicity, intersection angle and aposteriori standard deviation) are then individually used to filter bad 3D points, a new bundle adjustment is performed, and new quality parameters are computed. Table 2 shows the variations of the single quality parameters when only one is used to filter the sparse point cloud derived from the bundle adjustment. The variations of the median values are not consistent, e.g.: filtering tie points considering only their multiplicity leads to a higher multiplicity but a worse median re-projection error; filtering using only the re-projection error decreases the median intersection angles, etc. Such results confirm the limits of a single-feature filtering approach and the need of adopting a filtering method based on combined and aggregated parameters. Table 2. Improvements (+) and worsening (−) of median values of the considered quality feature when only one feature is used to filter the 3D tie points.

Variations of Single Quality Parameters
Re-proj. Error Multiplicity Inters. Angle A-Post. std. dev.

Single-Parameter Filtering: Tests and Issues
Many filtering methods employ a single quality parameter to discard bad points. To highlight that quality features are correlated and should be concurrently considered, a dataset of 133 terrestrial and 87 UAV images (Section 4.1.1) is processed in order to derive camera parameters and a sparse 3D point cloud. Quality parameters (re-projection error, multiplicity, intersection angle and a-posteriori standard deviation) are then individually used to filter bad 3D points, a new bundle adjustment is performed, and new quality parameters are computed. Table 2 shows the variations of the single quality parameters when only one is used to filter the sparse point cloud derived from the bundle adjustment. The variations of the median values are not consistent, e.g.: filtering tie points considering only their multiplicity leads to a higher multiplicity but a worse median re-projection error; filtering using only the re-projection error decreases the median intersection angles, etc. Such results confirm the limits of a single-feature filtering approach and the need of adopting a filtering method based on combined and aggregated parameters. Table 2. Improvements (+) and worsening (−) of median values of the considered quality feature when only one feature is used to filter the 3D tie points.

Normalization and Linear Aggregation
A data normalization process is mandatory for assigning equal weight to each quality parameter. Through several statistical approaches of normalization, differently measured and computed values can be adjusted and scaled in the same range [0, 1] [58].
In the past research [16], a min-max features scaling method was tested, with a simple linear transformation allowing independent variables to be rescaled in the same range based only on the minimum and maximum value.
In the current work, we explore the advantages of adopting a logistic function for representing and scaling the data, thus reducing the outliers' effect. The employed logistic function is represented by a sigmoid curve and an inflection point (sigmoid midpoint). Outliers are penalized by the characteristic trend of this curve, which exponentially grows but slows down its growth moving close to the boundaries. In the general formulation, the function is expressed as Equation (3): where: • L is the maximum value of the curve; • e is the Euler's number; • x0 is the x value of the sigmoid's midpoint; • k is the steepness of the curve.
The two main variable and adjustable parameters of this function are L and k (vertical and horizontal scaling parameters), which define how the curve is stretched or compressed to better fit the data. Following [59], we adopted a modified logistic function which proved to be efficient with comparable normalization issues: where x is the value to normalize, µ the mean and σ the standard deviation of the data. The normalized quality parameters values are then linearly combined with the proposed aggregation method Equation (5): • A v is the overall aggregated quality score computed with the normalized quality parameters for each 3D tie point; • V re is the normalized value of the re-projection error; • V mul is the normalized value of the multiplicity; • V ang is the normalized maximum angle of intersection; • V std is the normalized value of the a-posteriori standard deviation; • w is a weight computed for each 3D tie point as in Equation (6): with Mx being the multiplicity value of the considered tie point and Mmax the maximum multiplicity value of the entire set of data. The choice of weighting the aggregation function based on the respective multiplicity value of each 3D tie point is useful to reinforce the quality score of points measured and checked in a higher number of images (i.e., theoretically more reliable). Some tests on the variation of the achieved filtering results, weighting the aggregated score or not, are presented in Section 3.3.4.

Filtering Threshold Identification-The Statistical Approach
Once an aggregated quality score is assigned to each 3D tie point (Equation (5)), the identification of suitable thresholds for keeping only good-quality 3D points proved to be a very challenging task. Limits for filtering points, based on the respective aggregated score, are computed as the sum of single feature thresholds. The choice of using optimal values for each feature threshold, as presented in [16], hardly generalizes to every type of dataset and/or image block. Aiming at generalizing the filtering methodology, i.e., at automatically identifying suitable thresholds for each dataset, the statistical distribution of each quality feature is here considered. The analysis of values distribution for each quality parameter is particularly helpful for outlier identification, as well as for highlighting wrongly computed tie points, resulting from a weak image block geometry or issues in the matching and adjustment steps.
From the theory of errors, the general assumption is that random errors in a set of data have a normal distribution [31]. In the probability density function of a Gaussian distribution, the probability that the random error of variables lies in certain limits is defined as a symmetrical factor of the standard deviation. A common approach for the outlier removal is based on filtering data exceeding the "3σ rule". Therefore, mean values, plus or minus three standard deviations, are employed as thresholds for removing outliers, considering that 99.87% of data are distributed within this range [60]. Other authors have suggested less conservative approaches and a reduction of the considered standard deviations around the mean [61,62].
Nevertheless, in laser scanning and photogrammetric data, a deviation from a normal trend is frequently observed, especially due to outliers. The normality assessment can be supported by statistical analyses and parameters, such as the Q-Q plot (quantile-quantile) or the Skewness and Kurtosis values [62]. More robust approaches with respect to the 3σ rule method must be employed with large datasets heavily affected by outliers, or when Skewness and Kurtosis values prove a non-normal behaviour of the distribution of the errors [63,64]. In these cases, sample quantiles of distribution are usually used. The quantile of a distribution is the inverse of its cumulative distribution function. The 50% quantile is the median of the distribution and it is a robust quality measure, less sensitive to outliers, that performs better with skew distributions.
Another robust estimator is the σMAD, derived from the Median Absolute Deviation (MAD), i.e., the median of the absolute deviations from the median of all data: where b = 1.4826 is a general assumption of the normality of the data, disregarding non-normal behaviour introduced by outliers. In this work, median and median plus σMAD values are considered as possible thresholds for each quality parameter, as explained in the next section. Tests and results adopting the median plus 3σMAD features' values provide negligible effects on the sparse point cloud filtering.

Thresholds Tests
The robust statistical approaches described in the Section 3.3.3 for the identification of parameter thresholds were tested and the average improvement of quality features compared. In particular, using a dataset of 133 terrestrial and 87 UAV images (Section 4.1.1), the following filtering thresholds were considered: (1) Median values for each quality parameter, not weighting the aggregation function; (2) Median values for each quality parameter, weighting the aggregation function; (3) Median plus σMAD, not weighting the aggregation function; (4) Median plus σMAD, weighting the aggregation function. Table 3 reports the improvements of the computed quality parameters for the considered threshold approaches. Table 3. Average median improvement on quality parameters after applying different filtering thresholds.

Removed 3D Points
Re-Projection Error Multiplicity Inters. Angle A-Post. St. Dev.
A more aggressive filtering approach (i.e., case 1 and 2) produced a considerable improvement of the quality parameters. A less aggressive filtering (case 3 and 4) delivers more homogeneous results and removes less 3D points. The encouraging effects of a less conservative method (such as case 2), also confirmed by the other datasets and tests, supported the choice of selecting the median values as feature-threshold. A quite aggressive filtering (such as case 1) involved removing too many tie points, and images could not be oriented.

Experiments and Results
In the following sections, different case studies are presented to demonstrate the effectiveness and replicability of the developed methodology. The implemented and tested filtering procedure follows the subsequent steps ( From the joint orientation of the terrestrial and UAV images, about 405,000 3D tie points were derived in the sparse point cloud. Using the in-house Python tool, the initial quality parameters were extracted for each 3D tie point (Table 4). Aggregated median values were selected as a threshold in the filtering step (Section 3.3.4). About 280,000 3D tie points, with an aggregated score higher than the selected threshold, were automatically removed. The filtered set of tie points was then used for running a new bundle adjustment and refining the orientation results. Table 5 reports values and average improvements with respect to the original sparse point cloud of the recomputed quality parameters. The normality assessment of the a-posteriori standard deviation values before and after filtering, as well as related Kurtosis and Skewness parameters, are shown in Figure 6.  From the joint orientation of the terrestrial and UAV images, about 405,000 3D tie points were derived in the sparse point cloud. Using the in-house Python tool, the initial quality parameters were extracted for each 3D tie point (Table 4). Aggregated median values were selected as a threshold in the filtering step (Section 3.3.4). About 280,000 3D tie points, with an aggregated score higher than the selected threshold, were automatically removed. The filtered set of tie points was then used for running a new bundle adjustment and refining the orientation results. Table 5 reports values and average improvements with respect to the original sparse point cloud of the recomputed quality parameters. The normality assessment of the a-posteriori standard deviation values before and after filtering, as well as related Kurtosis and Skewness parameters, are shown in Figure 6. For the external quality check, a laser scanner point cloud (spatial resolution of 1 mm) was acquired with a Leica HDS7000. A seven parameters Helmert transformation was performed in order to co-register the photogrammetric and ranging point clouds. Then, the photogrammetric dense clouds obtained from the original processing and the one produced after the filtering step were  For the external quality check, a laser scanner point cloud (spatial resolution of 1 mm) was acquired with a Leica HDS7000. A seven parameters Helmert transformation was performed in order to co-register the photogrammetric and ranging point clouds. Then, the photogrammetric dense clouds obtained from the original processing and the one produced after the filtering step were compared with the reference laser scanning data. Using some selected areas (Figure 7), RMSEs (Root Mean Square Errors) of plane fitting procedures (Table 6) and cloud-to-cloud analyses ( Table 7) were performed. In both cases, the quality improvements of the dense point cloud derived after the filtering procedure was remarkable.  Table 6) and for the cloud-to-cloud distance analysis (b- Table 7).
(a) (b) Figure 7. Selected areas for the plane fitting evaluation (a- Table 6) and for the cloud-to-cloud distance analysis (b- Table 7).  Finally, a visual evaluation of the quality improvement is shown in Figure 8, where noise reduction and a more detailed reconstruction of the marble surfaces are visible.   Table 6) and for the cloud-to-cloud distance analysis (b- Table 7).

Nettuno Temple
The "Nettuno Temple" in the Archaeological Site of Paestum (Italy) [65] was surveyed, combining some 214 UAV images and 680 terrestrial images (Figure 9). For the UAV-based images a Canon EOS 550 D (pixel size 4.4 µm) with a 25 mm lens was employed (average GSD: 3 mm), while the terrestrial images were acquired with a Nikon D3X (pixel size 5.9 µm) and a 14 mm lens average terrestrial GSD 2 cm).

Nettuno Temple
The "Nettuno Temple" in the Archaeological Site of Paestum (Italy) [65] was surveyed, combining some 214 UAV images and 680 terrestrial images (Figure 9). For the UAV-based images a Canon EOS 550 D (pixel size 4.4 µm) with a 25 mm lens was employed (average GSD: 3 mm), while the terrestrial images were acquired with a Nikon D3X (pixel size 5.9 µm) and a 14 mm lens average terrestrial GSD 2 cm). Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 27 (a) (b) (c) The joint processing of UAV and terrestrial images produced a sparse point cloud with some 640,000 3D tie points. Results of the quality parameters extracted from the derived sparse point cloud are presented in Table 8. Once a suitable threshold was identified (Section 3.3.4), the automatic filtering procedure returned a new set of 3D tie points of about 187,000 (~71% of the original sparse cloud were removed). Results and improvement of the inner quality parameters for the Nettuno temple dataset after running a new bundle adjustment are shown in Table 9. Table 9. Values of quality parameters computed on the filtered sparse point cloud (~187 K 3D tie points) and average variations of the metrics. In order to further evaluate the improvements of the filtering procedure with respect to external information, RMSEs on 30 total station check points (Table 10) and cloud-to-cloud analyses on five sub-areas ( Figure 10 and Table 11) surveyed with laser scanning (spatial resolution 5 mm) were performed. For this case study, the improvement brought by the proposed filtering procedure was very evident. The joint processing of UAV and terrestrial images produced a sparse point cloud with some 640,000 3D tie points. Results of the quality parameters extracted from the derived sparse point cloud are presented in Table 8. Once a suitable threshold was identified (Section 3.3.4), the automatic filtering procedure returned a new set of 3D tie points of about 187,000 (~71% of the original sparse cloud were removed). Results and improvement of the inner quality parameters for the Nettuno temple dataset after running a new bundle adjustment are shown in Table 9. Table 9. Values of quality parameters computed on the filtered sparse point cloud (~187,000 3D tie points) and average variations of the metrics. In order to further evaluate the improvements of the filtering procedure with respect to external information, RMSEs on 30 total station check points (Table 10) and cloud-to-cloud analyses on five sub-areas ( Figure 10 and Table 11) surveyed with laser scanning (spatial resolution 5 mm) were performed. For this case study, the improvement brought by the proposed filtering procedure was very evident.  Figure 10. Selected five areas for cloud-to-cloud distance analyses between the laser scanning ground truth and the two photogrammetric clouds. (~−44%) Finally, some visual comparisons of the dense point clouds produced with a standard procedure ("original") and the proposed filtering procedure ("filtered") are provided in Figure 11.

Re−Projection
(a) (b) (c) Figure 10. Selected five areas for cloud-to-cloud distance analyses between the laser scanning ground truth and the two photogrammetric clouds. Table 11. Cloud-to-cloud distance analyses on the original and filtered dense cloud and average variation of the mean values. Finally, some visual comparisons of the dense point clouds produced with a standard procedure ("original") and the proposed filtering procedure ("filtered") are provided in Figure 11.  Figure 10. Selected five areas for cloud-to-cloud distance analyses between the laser scanning ground truth and the two photogrammetric clouds. Table 11. Cloud-to-cloud distance analyses on the original and filtered dense cloud and average variation of the mean values. (~−44%) Finally, some visual comparisons of the dense point clouds produced with a standard procedure ("original") and the proposed filtering procedure ("filtered") are provided in Figure 11. From the joint processing of UAV and terrestrial images, some 1.2 mil. 3D tie points were computed and their quality parameters extracted, as shown in Table 12. Adopting the proposed filtering procedure, some 717,000 tie points were removed (~61%). Results of the inner quality parameters variation are shown in the Table 13. From the joint processing of UAV and terrestrial images, some 1.2 mil. 3D tie points were computed and their quality parameters extracted, as shown in Table 12. Adopting the proposed filtering procedure, some 717,000 tie points were removed (~61%). Results of the inner quality parameters variation are shown in the Table 13. From the joint processing of UAV and terrestrial images, some 1.2 mil. 3D tie points were computed and their quality parameters extracted, as shown in Table 12. Adopting the proposed filtering procedure, some 717,000 tie points were removed (~61%). Results of the inner quality parameters variation are shown in the Table 13. For the external check, a laser scanner point cloud acquired with a Leica HDS7000 was employed (spatial resolution of 1 mm). Results of the cloud-to-cloud distance procedure on five sub-areas ( Figure 13) are presented in Table 14. Qualitative and visual analyses and improvements are shown in Figure 14.

Sub-Area
Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 27 For the external check, a laser scanner point cloud acquired with a Leica HDS7000 was employed (spatial resolution of 1 mm). Results of the cloud-to-cloud distance procedure on five sub-areas ( Figure 13) are presented in Table 14. Qualitative and visual analyses and improvements are shown in Figure 14.

The ISPRS/EuroSDR Dortmund Benchmark
The central part of the Dortmund benchmark [66] was considered. It contains the City Hall ( Figure 15) seen in 163 terrestrial images (average GSD: 3 mm) acquired with a NEX-7 camera (16 mm lens, 4 µm pixel size) and in 102 UAV images (average GSD: 2 cm) captured with a Canon EOS 600D (20 mm lens, 4.4 µm pixel size).

The ISPRS/EuroSDR Dortmund Benchmark
The central part of the Dortmund benchmark [66] was considered. It contains the City Hall ( Figure 15) seen in 163 terrestrial images (average GSD: 3 mm) acquired with a NEX-7 camera (16 mm lens, 4 µm pixel size) and in 102 UAV images (average GSD: 2 cm) captured with a Canon EOS 600D (20 mm lens, 4.4 µm pixel size).
Statistics computed on the extracted quality parameters of the original sparse point cloud (~315,000 tie points) are shown in Table 15, while results on the filtered tie points set (~70,000 points) are given in Table 16 Statistics computed on the extracted quality parameters of the original sparse point cloud (~315,000 tie points) are shown in Table 15, while results on the filtered tie points set (~70,000 points) are given in Table 16. As an external check, a laser scanner point data of the City Hall (2 mm resolution step) was assumed as ground truth. Results of the cloud-to-cloud distance on five sub-areas ( Figure 16) are shown in Table 17.  As an external check, a laser scanner point data of the City Hall (2 mm resolution step) was assumed as ground truth. Results of the cloud-to-cloud distance on five sub-areas ( Figure 16) are shown in Table 17.  Figure 16. Selected five areas for cloud-to-cloud distance analyses between the laser scanning ground truth and the two photogrammetric clouds. As a further check, RMSEs on 15 checkpoints measured with a total station and the average improvement applying our method were evaluated (Table 18). Moreover, some qualitative comparisons ( Figure 17) between the two final dense reconstructions are proposed.  Figure 16. Selected five areas for cloud-to-cloud distance analyses between the laser scanning ground truth and the two photogrammetric clouds.
As a further check, RMSEs on 15 checkpoints measured with a total station and the average improvement applying our method were evaluated (Table 18). Moreover, some qualitative comparisons ( Figure 17) between the two final dense reconstructions are proposed.

Discussion
The presented work extends two previous studies [16,17], proposing a new approach for achieving even better improvements in the 3D reconstruction results with a more flexible and timepreserving procedure. Limitations and weaknesses of the precedent methods pushed authors to develop the current work, based only on photogrammetric quality parameters and a robust statistical

Discussion
The presented work extends two previous studies [16,17], proposing a new approach for achieving even better improvements in the 3D reconstruction results with a more flexible and time-preserving procedure. Limitations and weaknesses of the precedent methods pushed authors to develop the current work, based only on photogrammetric quality parameters and a robust statistical approach for the filtering step. Statistically defined thresholds instead of strict and idealized values, as in [16], avoid the risk of an excessive filtering of tie points, increasing the feasibility of the procedure. The proposed weighted aggregation function (Section 3.3.2), based on the tie point multiplicity, allows for overcoming the time-consuming testing of several weight combinations for each quality parameter. Compared with the geometrical method presented in [16], the current filtering procedure is based only on photogrammetric parameters and proved to be more efficient. In the geometrical approaches, the suitable identification of optimal radii for the feature extraction strongly conditions the filtering results, and it proved to be a very challenging task working on sparse data. With respect to similar filtering approaches based on the quality analyses of computed 3D tie points [52,53], our method considers various and combined photogrammetric parameters for defining the quality of the reconstructed 3D points. The novelty and importance of their aggregation, as presented in Section 3.3.1, is related to the insufficient improvements of the single-features filtering approach.
Quantitative and qualitative results and analyses of the developed method have been extensively reported with four case studies. For verifying the robustness and effectiveness of the procedure, the presented datasets differ for the number of processed images, employed sensors and camera network geometries. A summary of the main results and average improvements of the reconstructions adopting the proposed filtering workflow is presented in Table 19. Although results clearly show the relevant benefits of adopting the presented method for refining the image orientation, some issues and limitations of the procedure have to be highlighted: (a) the filtering method does not fully consider the tie point distribution in image space. Too aggressive filtering could prevent the orientation of some images during the orientation refinement. This issue is partially solved by adopting more relaxed thresholds, as presented in Section 3.3.4, and weighting the aggregation function. (b) the presented method has not been yet verified in the case of multi-temporal datasets. (c) the computation of the quality features and the filtering procedure are performed with an in-house developed tool (https://github.com/3DOM-FBK/Geometry) which has been so far tested only in combination with the exported outputs from the commercial software Agisoft Metashape [67]. Some issues about file format compatibility could arise while testing our filtering tool in combination with other open or commercial software.

Conclusions and Future Works
This paper presented an enriched photogrammetric workflow for improving 3D reconstruction results when terrestrial and UAV images are jointly processed. In the last years, the unquestionable advantages of integrating these different sensor imageries have encouraged their use in several fields, from building to urban scales. Nevertheless, various sensor characteristics and environmental constraints, as well as irregular and unconventional camera network configurations, commonly affect the quality of the image orientation and 3D reconstruction results. This work deals with these processing issues, proposing an extended photogrammetric workflow where a filtering step is performed on the initial image orientation results. In the developed procedure, the initial sparse point cloud is filtered, a new bundle adjustment is performed to refine the camera parameters and, finally, a dense 3D reconstruction is generated. The filtering step is based on the evaluation of quality parameters for each 3D tie point computed within the bundle adjustment. The aggregation of these quality metrics allows removing low-quality tie points before refining the orientation results in a new adjustment. Thresholds for removing low-quality points are based on the statistical distribution of the considered quality parameters. The proposed robust statistical approach extends the feasibility of a previous method [16] to different quality datasets, and it is time-efficient if compared with other geometrical approaches [17].
The effectiveness of the developed procedure was tested on several datasets, all featuring terrestrial and UAV images and different scenarios. Relevant result improvements were proved using internal and external quality checks. The visual and qualitative comparisons of the dense reconstructions, generated with the standard and enriched workflow, show the relevance of the procedure.
Further tests will investigate the effectiveness and robustness of the proposed method with multi-temporal datasets, where complex network geometries and scene changes usually return unsatisfying 3D reconstruction results. In addition to that, the filtering procedure will be extended to consider also the tie point distribution in the image space. This could help to preserve a homogeneous distribution of the tie points in the image and prevent aggressive filtering.