A Low Cost UWB Based Solution for Direct Georeferencing UAV Photogrammetry

Thanks to their flexibility and availability at reduced costs, Unmanned Aerial Vehicles (UAVs) have been recently used on a wide range of applications and conditions. Among these, they can play an important role in monitoring critical events (e.g., disaster monitoring) when the presence of humans close to the scene shall be avoided for safety reasons, in precision farming and surveying. Despite the very large number of possible applications, their usage is mainly limited by the availability of the Global Navigation Satellite System (GNSS) in the considered environment: indeed, GNSS is of fundamental importance in order to reduce positioning error derived by the drift of (low-cost) Micro-Electro-Mechanical Systems (MEMS) internal sensors. In order to make the usage of UAVs possible even in critical environments (when GNSS is not available or not reliable, e.g., close to mountains or in city centers, close to high buildings), this paper considers the use of a low cost Ultra Wide-Band (UWB) system as the positioning method. Furthermore, assuming the use of a calibrated camera, UWB positioning is exploited to achieve metric reconstruction on a local coordinate system. Once the georeferenced position of at least three points (e.g., positions of three UWB devices) is known, then georeferencing can be obtained, as well. The proposed approach is validated on a specific case study, the reconstruction of the façade of a university building. Average error on 90 check points distributed over the building façade, obtained by georeferencing by means of the georeferenced positions of four UWB devices at fixed positions, is 0.29 m. For comparison, the average error obtained by using four ground control points is 0.18 m.


Introduction
The use of Unmanned Aerial Vehicles (UAVs) is becoming more and more frequent all over the world.Especially eased by the availability of low cost systems, the number of UAV applications is ever, growing in the last decade [1][2][3].Indeed, a number of factors (the affordable cost, the advantageous portability, limited size and weight, the possibility of flying over areas difficult to reach on the ground, the ability to quickly acquire a quite large amount of information of a relatively large area while still ensuring the possibility of quite close views, if needed) makes UAVs an attractive monitoring and surveying instrument in a wide range of conditions.Comprehensive reviews of the state of the art for UAV applications, positioning strategies and imaging sensors can be found in [4,5] and in the references therein.
Despite the fact that portability is a key factor in the current success of these devices, it comes with restrictions on the admissible payload [6]: this implies a careful election of the instrumentation mounted in the UAV.In particular, positioning and navigation are typically achieved by using lightweight Global Navigation Satellite System (GNSS) receivers and Micro-Electro-Mechanical System (MEMS)-embedded sensors (i.e., integrated use of GNSS and the Inertial Navigation System (INS) [7,8]); recent commercial solutions with lightweight multi-frequency GNSS receivers include, for instance, Javad [9], Topcon [10] and Maxtena [11] antennas, and drones with native multi-frequency GNSS receivers, such as senseFly eBee [12].UAV applications are actually enabled by the availability of GNSS; indeed, INS cannot be considered as a stand-alone solution because of the typical drift of position and attitude estimated states by the use of MEMS sensors.
Unfortunately, GNSS is not available or not reliable in certain operating conditions, i.e., mountains, close to high buildings, indoors.In these cases, other viable solutions shall be considered in order to enable reliable positioning (typically by the integration of information provided by different sensors [13][14][15][16][17]).
This paper investigates an alternative solution to enable positioning when GNSS cannot be used.The considered system can also be exploited to obtain metric reconstruction without the need for Ground Control Points (GCPs).Reconstruction is achieved through the following two steps in this approach: • use of surveyed low-cost Ultra-Wide Band (UWB) transmitters in order to estimate UAV position, • use of photogrammetry in a Direct Georeferencing (DR) fashion to obtain metric reconstructions.
In a UWB positioning system [18,19], each device is a radio transmitter and receiver that provides ranging measurements once connected to another UWB device.Ranging can be obtained either by received signal strength considerations or by the time of flight of the radio signal, where the latter method typically provides much more accurate measurements when the two devices are in a Clear Line Of Sight (CLOS, which is often a quite reasonable working condition when dealing with UAVs).UWB-based positioning has recently attracted attention thanks to the potentially high accuracy (with accurate calibration [13,20,21]) and the ability of partially passing through (not so thick) obstacles.In particular, its use has been recently investigated for indoor positioning with good results (a typical 3D positioning error of a few decimeters [14]).This work is based on the use of the Pozyx UWB positioning system [22].Pozyx devices are small and lightweight UWB transmitters (maximum side size is 6 cm; volume is approximately 100 cm 3 ; weight is 12 g), and they can be powered by portable batteries through standard micro-USB ports (power consumption is usually not an issue because transmit power is very low; battery life, which actually depends on the specific used battery, is typically much longer than that of UAV batteries).The cost of a Pozyx system with minimum requirements for obtaining 3D positioning is currently 600e [22].Range measurements are based on the time of flight of the UWB signal, with a nominal ranging accuracy of 10 cm, nominal maximum range of 100 m and nominal maximum update rate of 80 Hz.Communications between devices are based on a Time Division Multiple Access (TDMA) approach [22].Once turned on, each Pozyx device auto-detects the presence of the other devices, and after a few seconds, the system can be used to collect ranging data.The actual update rate of the range measurement and the real maximum range of a UWB device can vary depending on the system settings (e.g., transmit power, length of preamble signal, number of devices connected to the system).The firmware version installed on the devices might also affect system performance.An experimental characterization of UWB positioning performance is provided in Section 2. Furthermore, in order to improve positioning results obtained by the UWB system, vision-based positioning is investigated, as well (Section 3).
Photogrammetric reconstruction is obtained by means of a compact camera mounted on the UAV.Reliable photogrammetric 3D reconstruction is often achieved by introducing control points in the scene, with known positions [23,24].The use of control points can be extremely useful to improve the reconstruction and georeferencing accuracy of aerial photogrammetry; however, their use is difficult in certain cases, e.g., surveys during natural disasters in areas difficult to reach on the ground.In these cases, Direct Georeferencing (DG) is typically considered instead [25][26][27][28][29][30], i.e., the direct estimation of camera position and orientation with sensors mounted on the UAV (e.g., GNSS and INS).Motivated by the above considerations, DG has attracted the attention of several research groups in the last few years: inexpensive and lightweight sensor solutions are typically considered [25,28], where DG positioning accuracy is usually mainly related to the performance of the consumer-grade GNSS receiver in the integrated Position and orientation System (PoS) (e.g., ∼5 m [6,28]), whereas higher grade GNSS receivers can allow better positioning accuracy [31]).Similarly to DG, here, 3D reconstruction is obtained without using control points: in order to make the system usable even when GNSS is not available, UWB positioning is used instead.It is worth noticing that UWB positioning allows obtaining 3D reconstruction in a local coordinate system, whereas the global coordinates of (at least) three points have to be available for georeferencing the obtained reconstruction.Furthermore, despite the fact that camera self-calibration might be considered [32,33], to reduce reconstruction errors and computational burden, the camera is assumed to be pre-calibrated [34,35], and hence, an appropriate estimate of camera interior parameters is assumed to be available [36].Nevertheless, the results of the camera self-calibration case (done with Agisoft PhotoScan [37]) are also presented for comparison in Section 5.
To be more specific, in the experimental results shown in Section 5, a Canon G7X camera is mounted on a 3D Robotics IRIS UAV: the camera is positioned in order to acquire oblique aerial images of a university building, whose reconstruction is considered as the case study (Figure 1 shows the façade of the building considered as the case study).Despite the fact that the system is thought to be usable also in different environments, the considered case study has the clear advantage of allowing easy terrestrial laser scanning data acquisition, which is used for comparison, in order to validate photogrammetric reconstruction results.On the other hand, the considered case study does not allow providing information on the behavior of the UWB devices in other working conditions, which can also be of interest in several applications, e.g., in cluttered environments.This will be the subject of our future investigation.Finally, it worth noticing that the camera position is fixed and close to the UWB device mounted on the UAV: the position difference between the two is typically smaller than UWB positioning error, and it can be partially compensated if the system is appropriately calibrated and georeferencing and estimations of the attitude of the UAV are available.The paper is organized as follows: Section 2 analyzes the ranging characteristic of the considered UWB devices.UWB positioning performance and the real-time use of visual information for improving positioning results is considered in Section 3. Section 4 reviews the photogrammetric reconstruction with the introduction of UWB positioning information.Section 5 presents the results of the overall method.Final remarks and conclusions are reported in Sections 6 and 7.

Characterization of UWB Ranging
Positioning with the radio signal is usually achieved by using either the signal strength of the received signal or the time of flight [18]: the first is usually less influenced by changes in the environment, whereas the latter typically provides more accurate results (when measurements are in CLOS).It is worth noticing that in several UAV applications, it is quite realistic to assume that at least a subset of the considered devices is in CLOS; hence, the second approach (which is that implemented in the Pozyx system) shall be preferred.
A performance analysis of the Pozyx system considered in this work, which is based on the time of flight ranging measurement, is presented in the following.By convention, the UWB device whose position has to be tracked will be called the tag, whereas the other UWB devices will be named anchors in the following.In practice, the tag collects measurements of distances with respect to the anchors.
First, Figure 2a shows the distribution of the ranging error for eight UWB anchors when moving the tag along a track of 40 check points (whose positions are known with higher accuracy, i.e., millimetric level; the same track has been repeated several times in order to achieve a sufficiently robust statistical result): the minimum and maximum distances from an anchor to the tag are 1.8 m and 15 m, respectively.The relative orientation of the tag with respect to the anchors changes extensively during the track.It is worth noticing that in order to evaluate ranging errors with high accuracy, results reported in this section have been obtained in terrestrial experiments (aerial positioning measurements, i.e., during flight, are much less accurate due to the presence of an insufficiently accurate GNSS receiver on the considered UAV).Due to the extensive change of orientation and distance, the results shown in Figure 2a shall be considered as representative of the system performance in an average case (i.e., when the orientation and position of the tag varies among a quite large interval of values during the navigation).According to this observation, Figure 2a shows that ranging error (in the average case) is 15 cm (standard deviation), approximately.Furthermore, the error distribution is quite close to normal, and the bias is close to zero.
The results of Figure 2a are mostly confirmed also when considering just one device (Figure 2b): bias is larger than in Figure 2a, but still much smaller than the standard deviation (0.7 cm versus 16 cm, for bias and standard deviation, respectively).Ranging error distribution is also less close to normal with respect to Figure 2a: from this point of view, this device represents the worst case among the considered ones (despite the ranging error distribution of the other devices having a nicer appearance (i.e., closer to normal), absolute values of the bias and standard deviation are similar for all of the devices).
Hence, according to the results shown in Figure 2a,b, ranging error (when dealing with a large variety of anchor orientations and positions) is close to zero mean with (approximately) a 15-cm standard deviation, that is similar, but larger than the nominal value, i.e., 10 cm.
In order to provide a more detailed analysis of the device behavior, Figure 3a,b reports the average ranging errors between two UWB devices when varying the range (from 2 m to 40 m, for a fixed value of the relative orientation between devices) and the relative orientation between devices (from 0 to 2π, for two values of the distance, i.e., 3.10 m and 3.25 m, where two different devices were positioned at these distances).Each of the values shown in Figure 3a,b is the average of 1000 time samples (outliers, probably due to synchronization issues between the devices, cause certain larger values for certain error bars).It is worth noticing that the curves shown in Figure 3b have a partially common trend, with a difference of 40 ÷ 60 cm.According to Figure 3a,b, systematic error varies significantly with respect to both distance and relative orientation between the devices.For a time-invariant system configuration, i.e., non-varying environment and fixed device relative positions and orientations, residual errors (once discarding the bias, considered as systematic error, and filtering out outliers) have a standard deviation σ∼3.7 cm (Figure 3a).As shown in Figure 3a, the standard deviation of residual error is approximately constant when varying the relative distance between devices.
The maximum measurable range of the system is supposed to be 100 m, approximately.However, due to environmental noise and to the low signal strength, the real average frequency of the range acquisition decreases significantly as the distance between the devices increases.In practice, the flight track and UWB device positions have to be planned in order to ensure that at each time instant, the distance between at least four anchors and the tag is smaller than a predetermined threshold (e.g., <<40 m, where the actual threshold value shall be determined depending on the specific working conditions).In the experimental results shown in Section 5, the time lag between two range measurements received by the tag is 20 ms, leading to an update frequency lower than the maximum nominal one (80 Hz).However, system settings considered in our experiments aim at having the largest maximum range.This condition clearly leads to a lower real update frequency with respect to the maximum one.
Calibration can be considered in order to mitigate systematic error; however, as shown in Figure 3, its behavior is quite complex; hence, its compensation is probably too laborious for an end user: consequently, in order to make the proposed system's usage as simple as possible for an end user, the systematic error shown in Figure 3 is not compensated in this paper.
In practice, motivated by the results shown in Figure 3a and by the necessity of using a simple measurement model, ranging errors of all anchors with the tag will be modeled as independent, zero mean and with the same standard deviation σ uwb .However, it is worth noticing that since systematic error has not been compensated (and it has a complex behavior, as shown in Figure 3), σ uwb shall result in being typically larger than 15 cm, in particular when a short flight is considered (i.e., when the variation of distances and relative orientations is not sufficient to be considered an average case, as that of Figure 3a; see also the UWB positioning results shown in the Section 3).

UWB and Vision-Aided Positioning
The Pozyx system is provided of a native positioning functionality.However, the execution of the positioning task can slow down the range acquisition sampling rate of the system.Hence, in our current implementation, ranging measurements are just collected by the tag and post-processed off-line.This does not affect the considerations presented in the following sections on the metric reconstruction ability of the system.However, in order to use UWB positioning also in real time, either the system sampling rate has to be reduced or the positioning task has to be executed on an external device provided with more computational power.
Static 3D UWB positioning can be obtained by ranging measurements of at least four anchors (trilateration problem [38][39][40][41]; Appendix A).Anchor positions have to be known in order to make the tag positioning problem solvable.This can be achieved either by using external measurement instruments (e.g., total station, measuring tapes) or self-positioning by means of UWB range measurements (Appendix B).The first option shall be preferred, if possible, because accurate estimates of anchor positions are very important for the overall performance of the system; however, the latter can be interesting because it does not require the use of other instruments.Section 5 presents both the results obtained with anchor positioning with the Leica TCR 702 Total Station (2 angular accuracy, ≈2 mm distance accuracy) and the results of a simulation of anchor positioning with their own range measurements considered, as well.
UWB ranging data collected during the flight can be integrated with IMU measurements by means of an Extended Kalman Filter (EKF) [42,43]; however, according to the principle of making the system as simple as possible for the user, IMU sensors are not calibrated; hence, providing information that does not improve significantly the positioning results obtained by using only UWB measurements.Consequently, the considered positioning algorithm exploits only the information provided by UWB range measurements.
The proposed positioning method is as follows: 1.
First, an EKF computes an initial estimate of the tag positions (this task can be computed in real time, if needed), 2.
Then, the tag trajectory is optimized by using all of the collected measurements (this task has to be performed off-line).Since the considered photogrammetric reconstruction relies on the estimated positions in order to estimate the correct scale (and is also involved in the georeferencing task), the goal of this operation is that of improving as much as possible the estimated positions in order to ensure good reconstruction results.
Time synchronization of UWB devices is automatically performed by the Pozyx system, which provides a time stamp for each range measurement.Anchors are sequentially checked by the tag in order to collect the available range measurements.Let T be the time period characterizing one cycle of sequential data acquisitions, where the tag collects all available ranges from the anchors.
Since EKF provides just an initial estimate of the UAV positions, in its formulation time lag between measurements in the same acquisition cycle is neglected, leading to the following formulation: where x t and ẋt+1 are the estimated tag position and velocity, r t contains all available ranges, q i is the position of the i-th anchor, I 3 is the identity matrix of size 3 × 3, w t and v t are zero-mean random noises with diagonal covariance matrices Q, R t = σ 2 y I n t , where n t is the number of available measurements at time t.Values in r t depend on x t , {q i } i , and on the availability of the corresponding measurements at time t.
It is worth noticing that the state update Equation ( 1) is linear, whereas the measurement Equation ( 2) is nonlinear.The row in r t corresponding to the i-th anchor represents the measurement of ||x t − q i ||, where || • || is the Euclidean norm.Consequently, linearization shall be considered for (2).For simplicity of notation, assume that only the measurement from anchor i is available, then the partial derivative of r t with respect to the state (to be used in the EKF) is: where xt is the predicted tag position at time t given only the previous measurements.When a good estimate of initial position x 0 is not available, its static estimate can be used (see Appendix A).
Once the EKF estimated positions are available, they are used as initial conditions in the following optimization: where Xt = [x t ẋ t ] .Sums are considered for all time instants {t } of the UWB measurement cycles, and index i is limited to only the available measurements at the considered cycle t .The above equation exploits estimated velocity at t and time lag (t i − t ) in order to take into account asynchronous UWB measurements.Since range measurements are acquired sequentially at a constant sample frequency, a similar approach can be considered where state values are estimated at each time of range acquisition (i.e., for n times more time instants than when considering only estimates for cycles of measurements, where n is the number of anchors).However, this increases by a factor n the number of parameters to be estimated, which can lead to much longer computational time, in particular for long trajectories.Given also the quite regular UAV trajectory in our case study, the first option has been considered, and linear interpolation has been used in order to obtain estimates in intermediate time instants, if needed.
Outlier rejection of UWB measurements is also implemented in order to reduce the influence of measurements not in CLOS.The implemented procedure is actually quite simple.For each estimated xt , measurements with an error larger than a certain threshold (2 m in our experiments) are neglected.Other more complex outlier rejection strategies can be considered, as well [44].
Real 3D positioning error is actually affected by several factors, e.g.: • quality of MEMS inertial sensors.
In our working conditions (UAV flying in an open area, without obstacles between anchors and the tag), UWB measurements are mostly in CLOS.The number of anchors is fixed to eight (all of the currently available ones).Since the frequency of available UWB range measurements significantly decreases when the distance from the devices is large, the anchor network configuration and UAV track have been designed in order to ensure that all of the track measurements from at least four anchors are available at the standard frequency of the system.Furthermore, in order to reduce error along the vertical direction (which can be larger than that on the horizontal direction, similarly to GNSS positioning), UWB anchors have been positioned at quite different altitudes (varying from 0 m to 4 m from the ground level), whereas during experimental validation (3D reconstruction of the façade of the building in Figure 1; see also Section 5), UAV flying altitude varied between 14 m and 20 m, approximately.To be more specific, 28 images have been acquired along two almost parallel tracks: the two tracks are along similar horizontal positions, but at different altitudes in order to acquire images at the same distance from the building to be reconstructed.Spatial distance between two consecutive camera acquisitions is 1.5 ÷ 2 m, approximately.The average distance from UAV to the building is approximately 60 m during the flight.
Fitting errors between camera positions estimated by UWB devices and at the end of photogrammetric reconstruction (in this comparison, control points have been used in order to ensure metric reconstruction as accurate as possible) are used in the following in order to provide a rough estimate of UWB positioning error.
According to the procedure described above, average UWB positioning system error resulted in being 0.33 m.Taking into account the average ranging error of Figure 2a, the estimated UWB positioning error is actually larger than expected.However, Figure 2 has been obtained while varying distances and relative orientations between UWB devices among a large range of values.Instead, in this case, there is not so much variation; hence systematic errors (Figure 3) shall probably have a dominant effect with respect to the average behavior shown in Figure 2a.Furthermore, a comparison is done with respect to positions obtained indirectly, from the photogrammetric reconstruction: hence, estimation errors on the positions used for comparison might (partially) affect the results, as well.
Nevertheless, the obtained positioning accuracy shall be sufficiently good to be considered as usable in most of the cases when the GNSS signal is not available/reliable).
UWB positioning presented above has the main goal of ensuring the best (off-line) metric reconstruction.However, on-line positioning can be of interest when GNSS is not available, e.g., to substitute GNSS during UAV flight.The EKF presented above can be used for this purpose.However, visual information shall also be used in order to further improve position estimates (vision-aided positioning [45][46][47]).The rationale is that, if 3D information about certain locations visible in the image is available a priori, then by real-time recognizing of such locations in the image, it is then possible to obtain a 3D estimate of the UAV position based on processing visual information.Use of visual information for estimation of camera position and orientation has also been considered in the Simultaneous Localization and Mapping (SLAM) problem and, in particular, in visual SLAM [48].
It is worth noticing that the vision-aided positioning briefly described above is actually similar to what is usually done with GCPs.Indeed, points with already known 3D locations have exactly the same role of GCPs.The main difference with respect to the standard use of GCPs is that here, they are intended to be used in an on-line processing.A prior on their appearance on the images is assumed to be known, e.g., they can be manually selected in an image acquired before the flight.Then, when a new image is acquired, their positions in the image are automatically recognized, and camera external parameters (e.g., camera position with respect to already known GCPs positions) are successively computed.
Control points considered in this section are different from GCPs considered in the rest of the paper.In particular, in this section, the set of control points is formed by two of the GCPs shown in Figure 1 (bottom) and a subset of the check points distributed on the building façade, which will be considered in the following for validation of the photogrammetric reconstruction accuracy.GCPs distributed on the building façade are shown in Figure 4. Since the recognition of GCP positions in a new image has to be done in real-time, particular attention has to be placed on the computational complexity associated with this task: it is well known that image processing can be quite computationally demanding; hence, a trade-off between accuracy and computational burden has to be set in order to allow real-time processing of the visual data.
In practice, instead of using high resolution images acquired by the camera, a resized version of them is considered in the following.Actually, the rest of this section aims at comparing vision-aided positioning results obtained with images at different reduced resolutions (the size of the original one is 5472 × 3648 pixels).
Notice that this comparison has been done off-line; however, the maximum image size (1 Megapixel, which is obtained scaling each size of the original image by a factor 1/4) has been chosen in order to be processable in real time, i.e., within 1 ÷ 2 seconds (which is the approximate time interval between two image acquisitions in our case study).Four scaling factors (1/4, 1/8, 1/16, 1/32) have been considered for the reduced image side sizes.
Automatic recognition of control point locations in each image has been implemented as follows: First, the estimate of the camera position provided by the EKF is used in order to have a rough prior on the GCP positions on the new image.To compute this prior, UAV orientation is assumed to be invariant with respect to the previous time instant (however, updating at least the heading shall be very important if trajectories quite different from rectilinear are considered).Priors computed as above are only used in order to determine local image areas to search for the GCPs, hence reducing the computational burden of the approach.Then, the best matches of GCP template descriptors are searched within considered areas in the new image.Several types of matching techniques can be considered, and further information on the UAV dynamic can be introduced in the matching procedure, if available [49][50][51][52].The results presented here have been obtained by using matching on salient points extracted with the Harris filter (a quite low threshold should be considered in order to have a sufficiently large number of feature points as candidates for the matches).Once the best match is found, a second local search among a few pixel areas in the neighborhood of the matched positions can be considered in order to further improve the estimates.Then, exterior camera parameters with respect to the known 3D point position are estimated with Lowe's nonlinear method (camera pose estimation [53]).
Table 1 reports the obtained positioning results.Positions obtained by photogrammetric reconstruction are again considered as reference positions.The best available synchronization between UWB and camera measurements has been used (see Section 4 for notes on synchronization).Positioning errors for scale factors 1/4 and 1/8 (i.e., 7 cm and 14 cm, approximately) are good enough to be considered for improving positioning results obtained with the UWB system.However, it should be noticed that errors reported in the table are referring to camera positions obtained with photogrammetric reconstruction; hence, their goal is only that of providing a rough estimate of the potential performance.

Photogrammetric Reconstruction
In order to obtain 3D metric reconstruction without using control points, estimates of the exterior camera parameters are provided by the UWB positioning system.Since the camera is assumed to be calibrated, it allows obtaining 3D reconstruction in a local coordinate system up to a scale factor.Hence, the goal is that of exploiting information provided by the UWB positioning system in order to estimate the appropriate reconstruction scale factor and estimate the coordinate transformation to the UWB local reference system (which might be georeferenced), if needed.
The transformation between point cloud reconstruction and the UWB coordinate system is estimated by fitting camera positions according to the SfM reconstruction to their UWB positions.Let R s f m2uwb , t s f m2uwb , s s f m2uwb be the rotation matrix, translation vector and scaling factor describing the transformation from SfM to UWB coordinates.Initial values for these parameters are obtained with a closed form solution as follows:

•
Let {x s f m,t } t and {x uwb,t } t be the estimated camera positions corresponding to the N acquired images.

•
Consider {x uwb,t } t and {x uwb,t } t , the centered and normalized versions of {x s f m,t } t and {x uwb,t } t , i.e., let m s f m and m uwb be the averages of {x s f m,t } t and {x uwb,t } t , respectively.Furthermore, let . Then, • The initial value of R s f m2uwb is computed by solving the orthogonal Procrustes problem between x s f m,t and x uwb,t by using Singular Value Decomposition (SVD).

•
The scaling factor and translation vector are set to: Then, the above initial values are optimized (nonlinear optimization) by minimizing the following: In order to further improve the integration of camera and UWB measurements, also the introduction of UWB camera estimated positions in the photogrammetric bundle adjustment might be considered, using the values computed above as initial values for the optimization process.However, UWB measurements are usually much less than tie-points; hence, in our case study, weighting them according to their measurement accuracy, the difference between the obtained solution and the values computed above is negligible.
It is worth noticing that time synchronization of UWB and camera acquisitions actually plays a fundamental role in the estimation of the coordinate transformation from the SfM to UWB coordinate system.Since now, the camera and UWB acquisitions have been assumed to be synchronized, this however is not the typical case.
Let image acquisition times (according to the internal camera clock) be those saved in the corresponding Exchangeable Image File Format (EXIF).The time lag between the camera clock and the UWB tag clock is estimated as follows:

•
The sampling frequency of tag position estimates (given by ( 4)) is increased by a factor n (where n is the number of anchors) by linearly interpolating any two consecutive estimates.

•
An initial time synchronization is manually selected.Error affecting this manual synchronization is assumed to be lower than a second.A discrete set of equally-spaced time lag values in the time interval (−1 s, +1 s) is considered, where the distance between two time lags is 20 ms in our case study.

•
Camera positions estimated by the UWB system change depending on the time lag value τ, i.e., x uwb,t+τ is considered instead of x uwb,t in (7).Functional ( 7) is evaluated for each time lag value in the considered interval.

•
Time lag τ between the camera and UWB measurements is estimated as that minimizing (7) among the set of considered time lag values.
Despite the above procedure allowing one to obtain an estimate of the time lag between the camera and UWB acquisitions, better estimation accuracy shall be needed in certain cases, in particular when flight speed is higher.The reader is referred to [54] for a more detailed description of the problem and of solutions that shall be developed in order to improve synchronization accuracy.
When certain UWB georeferenced positions are known (at least three), then georeferenced reconstruction can be obtained, as well (estimation of the coordinate transformation can be done with a procedure similar to that of (5-7)).The obtained results as concerns reconstruction accuracy in local and georeferenced coordinate systems will be shown in Section 5.
An accurate estimation of the scale factor is clearly of fundamental importance for a metric reconstruction: the reliability of the estimated scale clearly improves as the number of consider images becomes larger.This is confirmed by the experimental results shown in Figure 5, which shows the scale factor estimation error, that is the root mean square error (RMSE) of the estimated scale factor with respect to the scaling value that allows the best fit between photogrammetric reconstruction and TLS survey.In Figure 5, the scale factor estimation error, which as expected decreases when considering a larger number of images (it is worth noticing that the results shown in Figure 5 on scale factor estimation have been obtained by using camera and UWB synchronization obtained as previously described with all of the available images), is expressed as a percentage with respect to the best scale value.Results shown in the figure have been obtained by means of 100 Monte Carlo simulations: in each simulation, images have been randomly sampled from the set of 28 available ones.However, it is worth noticing that a smart choice of such images can typically lead to better results: indeed, since the UWB ranging error is approximately independent of the measured range, it will have a much lower percentage influence when measuring larger distances.In practice, in order to minimize the scale estimation error, it is suggested to use UAV tracks where the maximum distance between two track points is as large as possible.
Sparse reconstruction has been obtained by means of a software produced ad hoc (by using automatic feature extraction and matching and solving the bundle adjustment in order to estimate 3D sparse point cloud and camera exterior parameters, similarly to Structure from Motion procedures [46,47]), whereas dense point clouds have been obtained by using SURE (photogrammetric Surface Reconstruction from imagery [55]), which provides dense matching similarly to the semi-global matching algorithm [56].For comparison, reconstruction obtained with Agisoft PhotoScan [37] will also be shown in the following section.

Reconstruction Results
This section aims at estimating the accuracy of photogrammetric results.In particular, the obtained reconstruction is compared with Terrestrial Laser Scanner (TLS) data (acquired by means of Leica ScanStation C10), i.e., TLS data are used as the reference.Despite TLS being currently considered as the state-of-the-art in surveying, it is worth noticing that photogrammetry can lead to larger dense point clouds (and potentially higher accuracies) in certain working conditions (in particular when the laser scanner cannot be positioned sufficiently close to the object of interest and/or it cannot be oriented in an appropriate way to allow good vision of the desired object features).
Nevertheless, in the considered case study, TLS acquisition has been done positioning the Leica ScanStation C10 on a ladder in order to allow a good acquisition of the building façade and the terrain in front of it.A single acquisition with the Leica ScanStation C10 has been done with point spacing of approximately 2 cm.Ninety check points (distributed on the building façade) have been considered to compare photogrammetric and TLS reconstruction.As shown in Figure 6, check points have been chosen because of their clear morphological aspect (at vertexes formed by the intersection of the edges of building components).The positions of check points in the point cloud have been initially set manually.Then, in order to improve their position accuracies, their position has been updated by locally fitting planes/lines in the neighborhood of each check point and using their intersection as the estimate of vertex positions.Given the TLS survey point spacing (approximately 2 cm), the method for improving check point positions, the TLS range error in technical specifications of the instrument (4 mm and 6 mm of distance and position single measurement accuracies, respectively), the accuracy of check point positions in the TLS survey should be (better than) 2 cm.This value is one order of magnitude lower than that of photogrammetric reconstruction, shown in Table 2.
First, photogrammetric reconstruction is considered in the local coordinate reference system of the UWB positioning system (Case A): metric reconstruction (e.g., scale factor) is estimated as shown in Section 4. In order to make these results comparable with those of TLS acquisition (clearly, the coordinate reference system is different), least squares estimation of the rigid map (only rotation and translation are allowed) between the two point clouds is performed.Results computed on the considered check points are reported in the first row of Table 2. Columns in the table show the obtained root mean square error (RMSE), the average of the error absolute value and its standard deviation, maximum and minimum of the error absolute value.The second row of Table 2 shows the results obtained for the georeferenced case.Georeferenced positions of four UWB anchors are used in order to obtain georeferenced reconstruction (Case B).
In certain cases, the use of a self-positioning system can be of interest.In this case, the relative positions of UWB anchors are computed by means of their own UWB range measurements.Unfortunately, these measurements have not been collected during the experiment.However, in order to provide a rough estimation of the possible results, the measurement process has been simulated, adding zero-mean Gaussian random noise (with a standard deviation of 0.15 m) to the distances between anchors obtained by positions measured with the Leica TCR 702 Total Station.The implemented method for UWB self-positioning is briefly described in Appendix B. The third and forth rows of Table 2 show the obtained results in the UWB self-positioning case for reconstruction in the local (Case C) and georeferenced coordinate system (Case D); georeferencing is obtained by using the same four georeferenced UWB anchor positions used in Case B.
Finally, fifth and sixth rows of Table 2 show the results obtained by computing photogrammetric reconstruction with Agisoft PhotoScan, first by using five GCPs (Case E) (GCPs are shown in Figure 1), and then using UWB measurements (and four georeferenced UWB anchors) in order to obtain georeferenced reconstruction (Case F).Agisoft PhotoScan performed self-calibration in both cases.
GCPs shown in Figure 1 have been realized with pickets (1 m high, approximately), whose top extremities have been painted in red in order to make them more easily identifiable in images.Error in Case A (and C) is affected by scaling factor estimation error, but its effect is actually quite minor (≈1 cm). Figure 7 (top) shows the error distribution on the building façade (frontal view).In order to ease the readability of the results, hereafter, the coordinate system (x, y, z) is assumed to be aligned with the building façade: x and y are the two axes spanning the horizontal plane, with -y corresponding to the versor normal to the (approximately) planar building façade.The z axis corresponds to the vertical direction.Origin is set to the bottom-left corner of the building façade.Figure 7 (bottom) shows the error distribution on the x-y plane (top-view).Errors in Figure 7 are amplified by a factor of five with respect to the building scale in order to make them more easily visible.Table 3 aims at providing more insights on the obtained results, in particular for the georeferenced Cases B, E and F. Columns ∆x, ∆y, ∆z show the error bias along the three coordinates x, y, z.Bias is easily visible in Figure 8 (top); the error distribution on the x-z plane (frontal view); and Figure 8 (bottom), the error distribution on the x-y plane (top-view).Errors in Figure 8 are amplified by a factor of five with respect to the building scale in order to make them more easily visible.
The best (in the least squares sense) transformation (rotation, translation and scale) from georeferenced reconstructions to the georeferenced TLS check point positions is estimated for each of the considered case in order to estimate the contributions of errors on camera orientations and positions.Let this transformation be defined by rotation matrix R θ , translation vector p and scale s; then, the values of θ (angle characterizing R θ in the axis-angle representation of rotation matrices) and p (expressed with respect to the building coordinate system) are shown in the forth, fifth, sixth and seventh column of the Table 3. Scale factor error has a minor contribution (s is very close to one), and it is not reported in the table.It is worth noticing that if R θ = I 3 and the scale is one, then p x = −∆x, p y = −∆y, p z = −∆z.

Discussion
The main aim of this paper is that of showing the possibility of obtaining 3D metric reconstruction without using control points from UAV aerial imagery (similarly to direct georeferencing [25,28]), by exploiting a UWB positioning system: the possibility of using UWB positioning when GNSS is not available (e.g., in tunnels, indoors, close to high buildings or mountains, in critical environments) could make it an attractive solution in such operating conditions.
UWB ranging characterization of Section 2 shows that the Pozyx UWB system is characterized by a quite complex systematic error (Figure 3).Such complexity is particularly apparent when comparing the results for the two different cases of Figure 3b (obtained with devices at real distances of 3.10 m and 3.25 m, respectively).
Nevertheless, results shown in Figure 2 and the need for a system as simple as possible to use by an end user motivate the modeling of UWB measurements as bias free and with a constant standard deviation.Despite this simplification, the results of Section 3 show that positioning error (0.33 m according to the rough estimation done in our case study) is probably sufficiently good to be usable in several applications.Better results can be expected when properly calibrating the UWB devices [13,20].
UWB positioning has been obtained by a multi-step procedure, involving the use of EKF (which can be initialized by using static measurements, as shown in Appendix A) and of nonlinear optimization, where the first can be done also in real time, if needed, while the latter is clearly an off-line optimization.The goal of nonlinear optimization is that of exploiting trajectory regularity in order to significantly reduce the effect of UWB measurement errors on the estimated trajectory.Use of a quite regular trajectory eased this process, while reducing the risk of excessive smoothing.
Synchronization of the camera and UWB devices is of fundamental importance in order to integrate their information.In this work, the estimation of synchronization time lag between camera and UWB measurements has been integrated in the process of estimating coordinate transformation between SfM and the UWB system (Section 4).The adopted procedure considered time lag estimation with a resolution of 20 ms.This however might be insufficient for UAVs flying at higher speed.Other strategies shall be considered for improving the synchronization accuracy when needed [54].
Once synchronization is ensured, vision can be used to improve positioning results by using quite low resolution images (see Table 1) once certain 3D positions in the scene are already known.
Reconstruction accuracy has been assessed by comparison with TLS data: Table 2 shows reconstruction error (in terms of RMSE, average of absolute value of errors and its standard deviation, maximum and minimum error) in different cases.
First, Case A shows reconstruction error on a local coordinate system.The main goal of the UWB system in this case is the estimation of the scale factor between SfM reconstruction and the real system.The actually estimated scale is close to the real one, and its estimation error has a minor effect on the overall reconstruction error, which hence is mostly due to the SfM reconstruction.A minor effect on the overall reconstruction error would also be due to the determination of check point positions in photogrammetric and TLS point clouds, as well as to the TLS measurement error.
Case B considers the case of georeferencing the photogrammetric reconstruction by using the georeferenced positions of four UWB devices (in a direct georeferencing-like fashion, without using control points).The obtained result (RMSE = 0.29 m, average error 0.28 m) is influenced by several factors: (i) the error in determining the coordinate transformation between the SfM reconstruction and the UWB system; (ii) non-coincident position of the UWB tag and camera; and (probably minorly) (iii) the georeferencing error of the four UWB devices considered for this purpose.An important role in (i) is played by the UWB positioning procedure and by the camera-UWB synchronization.For instance, despite the fact that in the considered case study, the UAV flight speed is quite low, considering a synchronization time lag of 0.2 s far from the chosen one leads in this dataset to an approximately doubled reconstruction error.This issue shall be more severe for higher flying speeds.
Cases C and D are similar to A and B, respectively.The main difference with respect to A and B is given by the simulated self-positioning of UWB devices (Appendix B).Results obtained in Case C are quite similar to those of Case A, hence proving that self-positioning has led to a relatively small estimation error of the reconstruction scaling factor.Differently, results in Case D are definitely worse than those in B. This is caused by an increase of the estimation error in the other parameters of the coordinate transformation (rotation and translation).
For comparison, Cases E and F show the results obtained by means of Agisoft PhotoScan.E considers the case of indirect georeferencing, i.e., by using GCPs (and camera self-calibration).In this case, the obtained results are comparable with those obtained in the local reconstruction Case A (this confirms that scale has been appropriately estimated in A), whereas indirect georeferencing has provided better results than direct georeferencing with UWB devices as shown in both Cases B and F. Case F resulted in a slightly worse result with respect to B, probably due to camera self-calibration (but, considering the standard deviation values, it might also be due to chance).
Table 3 aims at providing more insights on the obtained reconstruction error for the georeferenced Cases B, E and F. All error bias values (∆x, ∆y, ∆z) are significantly different from zero in Cases B and F (bias can also be seen in Figure 8 for Case B).The coordinate transformation map that allows the best fit of the obtained reconstructed check point positions to the TLS ones has been computed in order to provide an estimation of the bias causes.Estimated translation values p x and p z are quite different from zero for both B and F, suggesting a systematic error in the estimation of the camera positions.(i), (ii), (iii) and UWB positioning error are the most probable causes for this issue.Orientation error θ is also significant.In particular, it is worth noticing that at a distance of 60 m (that is, approximately the average distance of the UAV to the building façade during the camera acquisitions), the effect of an orientation error of 5.4 × 10 −3 rad is 0.32 m.A clear consequence of this observation is that repeating the experiment may obtain worse results than those reported here: Case B is probably quite lucky because orientation and translation errors partially cancel each other out.Taking this into account, the expected reconstruction error of the system should probably be higher (≈0.4 m).
Bias in Case E (i.e., ∆x) is also larger than expected.This is probably due to the positions of GCPs, which are not on the building façade (several meters far from it, as shown in Figure 1).Actually, positions of GCPs have been chosen in order to simulate the case of indirect georeferencing when the area to be surveyed is not reachable.
Obtained results show that the considered system does not allow reaching the same level of accuracy of photogrammetry with indirect georeferencing (and neither that of TLS, clearly).However, it could be of interest for applications in areas where GNSS is not available or not reliable and where TLS cannot be a viable choice because of difficulties in bringing the instrument to the area of interest.
The main limitation of the considered system is probably the quite limited range of the low cost Pozyx devices: since the frequency of available UWB range measurements significantly decreases for large distances, the UAV track shall be limited to an area quite close to the UWB devices.Consequently, a larger number of UWB devices shall be considered when surveying larger areas.This however can be a quite stringent requirement for the use of the system, in particular when its use is considered in areas that are difficult to reach on the ground and when quick activation of the system is required, e.g., during emergencies.Alternatively, the use of higher grade (and hence, more expensive) UWB devices shall be considered in order to partially tackle this issue (e.g., maximum range of 1 km, approximately, according to technical specifications of high grade UWB devices currently on the market).Both increasing the number of UWB devices and considering higher grade UWB clearly increase the cost of the system, making it less attractive.

Conclusions
This work considered the use of Pozyx, a low cost UWB positioning system, in order to provide photogrammetric reconstruction by means of images taken by a UAV.
Results on the considered case study have shown reconstruction accuracy comparable to that of indirect georeferencing with GCPs when dealing with reconstruction on a local coordinate system.In this case, positioning based on the UWB system is used mostly to estimate the correct reconstruction scaling factor.
Reconstruction in the direct georeferencing-like case led to a larger error, influenced by several issues, mostly due to camera-UWB system synchronization and non-coincident camera and UWB tag device position.Nevertheless, the obtained results might probably be of some interest for applications where GNSS is not available and TLS is not a viable solution.
Considering a larger number of images typically allows reducing the estimation error of both reconstruction scaling factor and camera-UWB system synchronization time lag.Use of a longer and more complex trajectory should also allow improving the estimation of SfM-UWB coordinate transformation.However, this is typically obtained by increasing the UAV track length, which because of the limited maximum range of Pozyx system, shall require the use of a larger number of devices, making it a not so viable option.UAV trajectory regularity shall also be ensured in order to allow the nonlinear positioning optimization to appropriately smooth the estimated trajectory.
According to our tests, the measurement accuracy of the considered Pozyx system is lower, but similar to the nominal one.Interestingly, Pozyx is a low cost system that automatically detects and synchronizes all of the available Pozyx devices once turned on.The main limitation is the apparent limited maximum range of the system, which, despite setting the transmission power to its maximum level, in our tests resulted in being significantly lower than the nominal one.
Future investigations will be dedicated to comparing the performance of low cost and high grade UWB devices, to evaluating the effect of the UWB anchor network on positioning and reconstruction accuracy, to developing a better UWB-camera synchronization method in order to make the system usable with a UAV flying at higher speed and to validate the system on other environments (e.g., cluttered environment).
To conclude, the obtained results are quite interesting in terms of reconstruction accuracy obtained in the considered case study.However, the considered system currently has limitations (in particular, due to the UWB device maximum range) that might restrict its usability.Either improvements on low

Figure 1 .
Figure 1.Façade of the university building considered as case study (top).Front view of the reconstructed façade (middle).Top view of the scene (bottom).Ground control point positions (used in the indirect georeferencing case) are shown in the bottom image with yellow markers.

Figure 2 .
Figure 2. Histogram of UWB ranging error: (a) errors from all devices; (b) errors from just one device.

Figure 3 .
Figure 3. Ranging error characteristics (bars correspond to ±one standard deviation value centered on the bias): errors are obtained varying (a) only for distance; and (b) only relative orientation between the devices, respectively.

Figure 4 .
Figure 4. Control points on the building façade considered for vision-aided positioning.

Figure 5 .
Figure 5. Scale factor estimation error varying the number of images randomly taken from the available ones.

Figure 6 .
Figure 6.Photogrammetric reconstruction of the building façade (front view).Positions of the 90 check points used to estimate reconstruction accuracy (red circles).

Figure 7 .
Figure 7. Errors, amplified by a factor of five with respect to the reconstruction scale, on check points for the reconstruction in local coordinates.Frontal view of the building façade (top) and top view (bottom).

Figure 8 .
Figure 8. Errors, amplified by a factor of five with respect to the reconstruction scale, on check points for the georeferenced reconstruction.Frontal view of the building façade (top) and top view (bottom).

Table 1 .
Vision-aided positioning error varying the scale factor of the processed image with respect to its original format.