Photogrammetric 3D Model via Smartphone GNSS Sensor: Workﬂow, Error Estimate, and Best Practices

: Geotagged smartphone photos can be employed to build digital terrain models using structure from motion-multiview stereo (SfM-MVS) photogrammetry. Accelerometer, magnetometer, and gyroscope sensors integrated within consumer-grade smartphones can be used to record the orientation of images, which can be combined with location information provided by inbuilt global navigation satellite system (GNSS) sensors to geo-register the SfM-MVS model. The accuracy of these sensors is, however, highly variable. In this work, we use a 200 m-wide natural rocky cli ﬀ as a test case to evaluate the impact of consumer-grade smartphone GNSS sensor accuracy on the registration of SfM-MVS models. We built a high-resolution 3D model of the cli ﬀ , using an unmanned aerial vehicle (UAV) for image acquisition and ground control points (GCPs) located using a di ﬀ erential GNSS survey for georeferencing. This 3D model provides the benchmark against which terrestrial SfM-MVS photogrammetry models, built using smartphone images and registered using built-in accelerometer / gyroscope and GNSS sensors, are compared. Results show that satisfactory post-processing registrations of the smartphone models can be attained, requiring: (1) wide acquisition areas (scaling with GNSS error) and (2) the progressive removal of misaligned images, via an iterative process of model building and error estimation.


Introduction
Structure from motion-multiview stereo (SfM-MVS) photogrammetry workflows allow the construction of high-resolution 3D models of landforms by matching narrow baseline partly overlapping aerial and/or terrestrial photo surveys [1][2][3][4]. In practice, the application of SfM-MVS photogrammetry towards close-range remote sensing of the planets' surface requires two fundamental procedures, namely (i) model building and (ii) model registration. Model building initiates with the acquisition of image data and encompasses image key point detection and matching, camera pose and sparse scene Scheme showing in map view the effect of lens distortion on scene registration. Models using only photographs oriented perpendicular to the reconstructed landscape are affected by doming of the scene, which leads to erroneous estimates of camera position during structure from motion-multiview stereo (SfM-MVS) reconstruction. Models using oblique photos are not affected by doming, but the reconstructed orientation and position of photos oblique to the scene exhibit large errors. In both cases, utilizing XYZ coordinates for an image that has been misaligned during SfM-MVS reconstruction stretches the scene and produces major scaling errors.
This work aims to test the accuracy of terrestrial SfM-MVS photogrammetric models registered with smartphones, addressing potential issues pertaining to scene reconstruction and (geo) location. For this purpose, we used the Pietrasecca (L'Aquila, Italy) exposure in central Italy as a test case. We initially built a 3D model using UAV photogrammetry (from hereinafter named as the UAV model). Then we registered the UAV model using ground control points (GCPs), located by fast-static positioning using a differential GNSS antenna. Then we built two separate terrestrial SfM-MVS photogrammetry models using smartphone images (hereinafter named smartphone models). We registered these smartphone models using data from the accelerometer/gyroscope and GNSS sensors of the smartphone, following the workflow illustrated in Figure 2 and in the step-by-step video tutorial (https://youtu.be/Jcoa0cc0L4I). Finally, we compared the smartphone models against the UAV model. Results of this comparison are discussed, with best practices for image acquisition proposed.

Figure 1.
Scheme showing in map view the effect of lens distortion on scene registration. Models using only photographs oriented perpendicular to the reconstructed landscape are affected by doming of the scene, which leads to erroneous estimates of camera position during structure from motion-multiview stereo (SfM-MVS) reconstruction. Models using oblique photos are not affected by doming, but the reconstructed orientation and position of photos oblique to the scene exhibit large errors. In both cases, utilizing XYZ coordinates for an image that has been misaligned during SfM-MVS reconstruction stretches the scene and produces major scaling errors.

The Site
The reconstructed scene is a rocky cliff situated in the village of Pietrasecca in Central Italy (Latitude 42.135° N, Longitude 13.134° E) (Figure 3). The cliff consists of two nearly vertical, perpendicular rock walls exposing fractured Miocene limestones surrounding the Pietrasecca fault [31]. The

The Site
The reconstructed scene is a rocky cliff situated in the village of Pietrasecca in Central Italy (Latitude 42.135 • N, Longitude 13.134 • E) (Figure 3). The cliff consists of two nearly vertical, perpendicular rock walls exposing fractured Miocene limestones surrounding the Pietrasecca fault [31]. The western wall is oriented NNW-SSE and it is 10-20 m high, whereas the northern wall is oriented WSW-ENE and it is ∼50 m high. The two walls are separated by a 5 m wide fault core zone striking WSW-ENE.

SfM-MVS Model from UAV Photogrammetry
We acquired 542 photos on 1 February 2020 ( Figure 3A), between 11.51 and 13.36 CET, using a DJI Mavic 2 Pro equipped with a Hasselblad Camera with a CMOS 1"-sized 20Mpx sensor. Focal length was 10.26 mm (corresponding to 28 mm/35 mm equivalent focal lengths) and flight distances from the cliff ranged between 7 and 20 m, resulting in a ground sampling distance spanning from 2 to 5 mm/pixel. We processed these photos in the Agisoft Metashape software (formerly Photoscan), version 1.6.2, and generated a point cloud composed of 1.43 × 10 8 vertices ( Figure 4). The area of the reconstructed surface is 13,933 m 2 , resulting in an average point density of 1.03 point/cm 2 . We measured seven GCPs using a differential GNSS Leica Geosystems GS08 antenna, which were later used for model georeferencing ( Figure 3A). The model was georeferenced directly in Metashape, by inputting the measured coordinates of the manually identified GCPs. The difference between the measured and estimated position for the 7 GCPs was <5 cm, this representing a proxy for the georeferencing error. Accuracy of the model in X and Y coordinates was also evaluated by projecting an orthophoto built from the model over a 1:5000 topographic map ( Figure 3B). Mismatches are in the order of <50 cm, which is far below the nominal cartographic accuracy of the map (1 m). Based on the availability of GCPs and the high number of input images, we consider the UAV model as ground truth data to benchmark the subsequent smartphone models, which form the main focus of this study.

SfM-MVS Model from UAV Photogrammetry
We acquired 542 photos on 1 February 2020 ( Figure 3A), between 11:51 and 13:36 CET, using a DJI Mavic 2 Pro equipped with a Hasselblad Camera with a CMOS 1"-sized 20Mpx sensor. Focal length was 10.26 mm (corresponding to 28 mm/35 mm equivalent focal lengths) and flight distances from the cliff ranged between 7 and 20 m, resulting in a ground sampling distance spanning from 2 to 5 mm/pixel. We processed these photos in the Agisoft Metashape software (formerly Photoscan), version 1.6.2, and generated a point cloud composed of 1.43 × 10 8 vertices ( Figure 4). The area of the reconstructed surface is 13,933 m 2 , resulting in an average point density of 1.03 point/cm 2 . We measured seven GCPs using a differential GNSS Leica Geosystems GS08 antenna, which were later used for model georeferencing ( Figure 3A). The model was georeferenced directly in Metashape, by inputting the measured coordinates of the manually identified GCPs. The difference between the measured and estimated position for the 7 GCPs was <5 cm, this representing a proxy for the georeferencing error. Accuracy of the model in X and Y coordinates was also evaluated by projecting an orthophoto built from the model over a 1:5000 topographic map ( Figure 3B). Mismatches are in the order of <50 cm, which is far below the nominal cartographic accuracy of the map (1 m). Based on the availability of GCPs and the high number of input images, we consider the UAV model as ground truth data to benchmark the subsequent smartphone models, which form the main focus of this study.

SfM-MVS Model from Smartphone
For smartphone image acquisition we use a Xiaomi 9T Pro smartphone equipped with the Broadcom's BCM47755 dual-frequency (E1/L1 + E5/L5) GNSS chip. Image acquisition was carried out by walking at a distance spanning from 10 to 50 m from the exposure and keeping the long axis of the smartphone nearly horizontal and in a nearly upright position (i.e., the inclination was <10 • ). The availability of dual-frequency GNSS smartphones, jointly with the readiness of GNSS raw measurements in Android, represents a breakthrough for precise geolocation with user-grade technology. The readiness of raw GPS/Galileo measurements allows using algorithms once restricted to professional geodetic GNSS receivers. This, in turn, allows users to fully benefit from the differentiators offered by GPS and Galileo satellites. Thanks to E5/L5 and E1/L1 frequencies, chipsets and receivers benefit from better accuracy, improved code tracking pseudorange estimates, and faster transition from code-to-phase tracking. The simultaneous use of two frequencies reduces the propagation errors due to the ionosphere. In addition, the frequency diversity is more robust to interference and jamming. It is well known that the E5/L5 frequency makes it easier to distinguish real signals from the ones reflected by buildings, reducing the multipath effect, which is a major source of navigation, static, and RTK positioning error in cities and other challenging environments [32]. All this provides improved positioning and navigation both in urban and mountainous environments.
We constructed the two SfM-MVS models using two 12 Mpx resolution photographic datasets. In detail, Model 1 was built using 48 photographs taken on 6 June 2020, between 9.16 and 9.22 CET, whereas Model 2 was built using 31 photographs taken on June 30th, between 10.26 and 10.30 CET. For both models, the acquisition speed was~1 photo every 10 s. The two models consist of 2.15 × 10 7 points (Model 1) and 1.39 × 10 7 points (Model 2) over a surface of 8467 m 2 (Model 1) and 12,898 m 2 (Model 2), with a mean vertex density of 0.25 point/cm 2 and 0.11 point/cm 2 respectively. We acquired the photographs using the AngleCam App for Android, which provides photos' orientation via the magnetometer and accelerometer/gyroscope sensors. The AngleCam App also provides an estimation of the precision of the camera positioning, which for Model 1 was 3.8 m for 40 photos and~10 m for 8 photos, whereas Model 2 was 3.8 m for all the photographs. The precision for the obtained coordinates is apparently not in accordance with the E5/L5 + E1/L1 GNSS mode, which should ensure cm-level accuracy. As later discussed, this is due to high acquisition speed, preventing stabilization of the measure.
We used orientation and position information prior to and after subsequent steps of image filtering to register the smartphone models, leading to the construction of 15 and 9 pairs of sub-models for Model 1 and Model 2, respectively, following the workflow illustrated in Figure 2.

Ground Truth Smartphone Models
To standardize the comparison of the smartphone sub-models, we built ground truth smartphone models for both smartphone Model 1 (GTSModel 1) and Model 2 (GTSModel 2). In detail, we built GTSModel 1 and 2 in CloudCompare, where we manually aligned the unreferenced point clouds of smartphone Model 1 and Model 2 to the georeferenced UAV model using 16-point pairs ( Figure 5A). These point pairs are well-recognizable in all the models and were selected in unvegetated portions of the cliff. The manual alignment (instead of an automatic one) was necessary to overcome issues related to the vegetation: the UAV and smartphone surveys were performed in different periods (February vs. June) and during this time lag, the vegetation has grown. Apart from differences due to vegetation, namely trees and bushes occurring in the smartphone models and not present in the UAV model, GTSModel 1 and GTSModel 2 show a very good agreement with the UAV model, with the cloud-cloud nearest neighbor distance computed in CloudCompare being largely below 20 cm for both models ( Figure 4). The alignment of the smartphone models in CloudCompare returns a transformation matrix that we used to reconstruct the estimated position of photographs. The result of this reconstruction is shown in Figure 5B  values for ∆X, ∆Y, and ∆Z are provided in Figure 5B,C. This mismatch includes different effects: (i) systematic and random errors associated with the GNSS data and (ii) errors in the derivation of camera position during structure from motion estimation, including the lens distortion effect mentioned in the introduction. As shown in Figure 1, this latter problem can produce: (1) a doming effect (i.e., a non-uniform shift of the photos' position along the survey transect), which is readily detected by analyzing camera position in the direction parallel to the reconstructed scene, or (2) a shift of photos with a view direction oblique to the reconstructed scene. The doming effect can be identified by projecting estimated and measured positions along a direction approximately parallel to the scene straight direction (i.e., the direction labeled "diagonal path" in Figure 5B,C). We computed the camera positions along the diagonal path (PAP) for both the estimated (after model's alignment) and the measured camera positions. The difference between the two positions for Model 1 varies along the path ( Figure 5D), apparently videncing the occurrence of doming. Model 2 is less affected by this deformation ( Figure 5E), as shown by the high degree of data scattering. For both models, the negative value of the slope indicates that the reconstructed PAP of photos tends to be slightly closer to the center of the path than the measured one. On the other hand, plots of the azimuth of the camera view directions versus PAP ( Figure 5F,G) clearly evidence that the apparent shift of positions is related to the fact that the camera view direction changes along the diagonal path, i.e., oblique photos are mostly taken at the edges of the survey transect. In the following sections, we show how to mitigate lens distortion issues through the recognition and progressive removal of misoriented photos.

Registration of Smartphone Models and Comparison with Aligned Models
The Metashape output for the two smartphone models which form the focus of this study was initially unreferenced and unscaled (i.e., they were output within an arbitrary reference frame). Subsequently, these models were georeferenced following the procedure described in Tavani et al. [16], whereby the model is initially rotated using camera orientation data, and then scaled, translated, and re-oriented using positional data provided by the GNSS, as detailed below.
Comparison between measured (by AngleCam) and estimated (i.e., reconstructed from the SfM-MVS software) camera orientations represents the first step in model registration. The orientation of each image can be described by three mutually orthogonal directions: the first is the view direction (ξ), whereas the other two correspond to the direction parallel to the long (ρ) and short axis of the photo, respectively ( Figure 6; Table 1). For smartphone photos, these measured directions (ξ and ρ) are obtained from orientation data given by the AngleCam Android App. The estimated view directions can be derived from the reconstructed Yaw, Pitch, and Roll angles in the arbitrary reference frame defined within Metashape. Alternatively, for a less complicated derivation, we suggest that the users export the cameras from Metashape as N-View Match format (*.nvm) files and import these into Meshlab software (https://www.meshlab.net/), where ξ and ρ are represented in a more readable and visually inspectable form as unit vectors.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 20 Figure 6. Pinhole camera model. The camera center and focal length are shown outside the phone for sake of simplicity. ξ is the unit vector defining the camera view direction; ρ is the unit vector orthogonal to the camera view direction and containing the image long axis. GTSModel Ground truth smartphone model: smartphone model aligned with the georeferenced UAV model PAP Position along the diagonal path (the path is shown in Figure 5): projection of camera position along the path ξ Camera view direction ρ Axis orthogonal to the camera view direction and containing the long axis of the image Δξ Difference between the measured and the estimated-and-rotated ξ   Orienting the model seeks to find the rotation axis and angle which align the ξ and ρ of the Metashape unreferenced model with those measured using the smartphone. We retrieved these two parameters using the method described in Tavani et al. [25] and subsequently used them to rotate the estimated camera positions of the two smartphone models. After this procedure, the camera locations have been translated and scaled. The translation vector is obtained by subtracting the centroid of the estimated-and-rotated SfM-MVS camera locations from the barycenter of the GNSS measured camera locations, while the scaling factor is obtained minimizing the residual sum of squares (RSS) between the previously translated SfM-MVS camera positions and their measured equivalents. The position of the cameras obtained at this stage is hereinafter named registered camera position (RCP). This procedure is further optimized by aligning the RCP XY coordinates to the measured camera XY coordinates, by rotating data around a vertical axis. The position of cameras obtained at this stage is hereinafter named finely registered camera position (FRCP). These two steps of RCP and FRCP computation are achieved using OpenPlot [33]. For each photo, we also computed the angular difference between the estimated-and-rotated and the measured ξ and ρ directions: i.e., ∆ξ and ∆ρ. We used the ∆ξ and ∆ρ average values (∆λ) to characterize the orientation mismatch. We recursively performed the aforementioned procedure of model registration, eliminating at each step, the three photos with the higher values of ∆λ, until we reached a minimum number of suitable photos (i.e., >8). During these iterations, we produced 15 pairs of sub-models (i.e., 15 RCP and 15 FRCP) for smartphone Model 1, and 9 pairs of sub-models for smartphone Model 2. This process is aimed at removing misoriented outlier photos, to investigate if and how this improves the registration process. As an example, Figure 7 shows ∆λ versus the normalized position along the diagonal path (trace of the path is shown in Figure 5B,C) and camera view direction (ξ) for six sub-models. The slope of the best fit lines reduces with progressive removal of photos with higher ∆λ. This latter parameter becomes almost zero in the sub-models with the least number of photos, meaning that ∆λ for these models decreases and becomes ostensibly insensitive to camera position and orientation.
The importance of ∆λ vs PAP and ∆λ vs ξ relies on the fact that, differently from parameters shown in Figure 5D-G (i.e., the difference between measured and estimated camera position), they are derivable factors regardless of the availability of ground truth models. Hence, defining how these parameters vary according to the quality of georeferencing enables their use to evaluate the georeferencing quality when GCPs are unavailable.
oriented outlier photos, to investigate if and how this improves the registration process. As an example, Figure 7 shows Δλ versus the normalized position along the diagonal path (trace of the path is shown in Figure 5B,C) and camera view direction (ξ) for six sub-models. The slope of the best fit lines reduces with progressive removal of photos with higher Δλ. This latter parameter becomes almost zero in the sub-models with the least number of photos, meaning that Δλ for these models decreases and becomes ostensibly insensitive to camera position and orientation.  Figure 5) and view direction of photographs vs. the angular difference between the estimated-and-rotated ξ and ρ and the measured ones (i.e., Δλ = (Δξ + Δρ)/2).  Figure 5) and view direction of photographs vs. the angular difference between the estimated-and-rotated ξ and ρ and the measured ones (i.e., ∆λ = (∆ξ + ∆ρ)/2).

Results
Through the registration procedure outlined above, we produced multiple registered camera positions (RCP) and finely registered camera positions (FRCP) for the two smartphone models. We used these positions (RCP and FRCP) to georeference the two smartphone models, obtaining 48 sub-models (15 pairs for Model 1 and nine pairs for Model 2). We also produced two GNSS models, whose registration was obtained by using the XYZ coordinates of each photo, as recorded by the smartphone GNSS sensor, without considering the photo orientation data.
In CloudCompare, we have automatically aligned each one of these sub-models to the corresponding ground truth model (i.e., GTSModel 1 and GTSModel 2) using the "finely register" alignment function. Outputs of this automatic alignment procedure include: scaling factor, translation vector (with the ∆X, ∆Y, and ∆Z component), and three Euler angles, i.e., the angles of rotation around the X (Psi angle), Y (Theta angle), and Z axis (Phi angle), along with the sum of the modulus of these angles. In Figure 8, translation and rotation parameters of the alignment to the ground truth models are plotted against the data number (i.e., the number of photos used to register the sub-model) of the sub-models and the ∆λ parameter (∆λ is computed using those photos employed to register each sub-model). In all the graphs, the highest values of data number and ∆λ correspond to models for which misoriented photos have not been removed yet. Hence, each subsequent data point from right to left corresponds to a successive iteration. The two GNSS models are unique and consequently are represented in the graphs of Figure 8 as two horizontal lines at constant value. Figure 8A shows that the progressive removal of misaligned photos produces a slight increase in quality in terms of translation, with up to 1.5 m of gain in positional agreement for the FRCP model 2 with respect to the GNSS model 2. Conversely, the impact that this iterative optimization procedure has on the rotational component of model registration is significant ( Figure 8B). For Model 1, the sum of rotation angles passes from almost 12 • for the GNSS model, to less than 6 • for many FRCP and RCP sub-models. For Model 2, the sum of the rotation components passes from 6 • for the GNSS model to less than 2 • for the FRCP sub-models, down to less than 1 • for some of these sub-models. It is worth noting that the minimum angular displacement is obtained when ∆λ ∼2 • , corresponding to sub-model 5, for which the data number (i.e., the number of used photos) is 19. Concerning the individual rotation components, Psi and Theta tend to stabilize in models with ∆λ < 2 • and differences between each FRCP and RCP sub-model is <1 • . The Phi angle (rotation around the Z axis) is instead highly variable. It is even higher than the GNSS models for RCP models and the FRCP Model 1. For FRCP Model 2, it is <1 • regardless of the ∆λ value.
We also plotted the average distance between the measured and reconstructed positions of photos (point to point distance) ( Figure 9A) and the scaling factor against the number of photos used to register the sub-model and ∆λ ( Figure 9B). The scaling factor is also plotted against the slope of the ∆λ vs. (i) normalized PAP ( Figure 9C) and (ii) view direction (ξ) ( Figure 9D). These plots allow us to evaluate whether scaling is influenced by the spatial distribution and orientation of photographs used for registration and, ultimately, to quantify how much camera misalignment related to lens distortion affects the scaling. Data illustrated in Figure 9A suggests that when photos with high ∆λ are progressively removed, the average distance between the measured and reconstructed positions of cameras slightly reduces for FRCP and RCP Model 1 and for FRCP Model 2. RCP Model 2 shows contrary behavior, with a general increase of point to point average distance when photos with high ∆λ values are progressively removed. With respect to scaling ( Figure 9B), progressive removal of misaligned photos produces improved scaling factor accuracy for both RCP and FRCP Model 1, which approaches 1 (i.e., the model is correctly scaled) for ∆λ < 1 • . Model 2 shows, instead, a more complex trend: the scaling factor firstly decreases from 0.97 to 0.95, then rises up to 0.98%. The local minimum is attained for ∆λ values ranging between 1 • and 2 • . As anticipated (Figure 1), an incorrect scaling factor could be related to the non-random misalignment of cameras caused by unaccounted lens distortions. The effect of misaligned cameras on scaling is readily detectable in the plots of Figure 9C,D, showing that when the slope of the ∆λ vs. PAP and of ∆λ vs. view direction decrease, the scaling factor decreases. In summary, when the slope of these regressions is ∼0, the corresponding sub-model is registered with images negatively affected by lens distortion and the scaling is close to 1. We also plotted the average distance between the measured and reconstructed positions of photos (point to point distance) ( Figure 9A) and the scaling factor against the number of photos used to register the sub-model and Δλ ( Figure 9B). The scaling factor is also plotted against the slope of the Δλ vs (i) normalized PAP ( Figure 9C) and (ii) view direction (ξ) ( Figure 9D). These plots allow us to evaluate whether scaling is influenced by the spatial distribution and orientation of photographs used for registration and, ultimately, to quantify how much camera misalignment related to lens distortion affects the scaling. Data illustrated in Figure 9A suggests that when photos with high Δλ are progressively removed, the average distance between the measured and reconstructed positions of scaling factor could be related to the non-random misalignment of cameras caused by unaccounted lens distortions. The effect of misaligned cameras on scaling is readily detectable in the plots of Figure  9C,D, showing that when the slope of the Δλ vs. PAP and of Δλ vs. view direction decrease, the scaling factor decreases. In summary, when the slope of these regressions is ~0, the corresponding sub-model is registered with images negatively affected by lens distortion and the scaling is close to 1. Figure 9. Mismatch between registered and ground truth smartphone models for the different submodels. The mismatch includes: (A) the average distance between measured and reconstructed camera position and (B) scaling needed to accurately align the smartphone sub-models to the corresponding ground truth model. Scaling is also compared with the slope of the best fit lines of Δλ vs (C) Figure 9. Mismatch between registered and ground truth smartphone models for the different sub-models. The mismatch includes: (A) the average distance between measured and reconstructed camera position and (B) scaling needed to accurately align the smartphone sub-models to the corresponding ground truth model. Scaling is also compared with the slope of the best fit lines of ∆λ vs (C) position along diagonal path and (D) view direction (as shown in Figure 7). The plots also include the mismatch between the ground truth models and the model registered using the GNSS position (without using camera orientation information).

Rationale
In this work, we have benchmarked SfM-MVS photogrammetric 3D models created and registered via smartphone sensors against a SfM-MVS photogrammetric 3D model created via UAV based image acquisition and registered using accurately geo-located GCPs (georeferenced using a differential GNSS antenna). The latter model has been considered as the true ground data (benchmark), with the good agreement between the orthophoto derived from the UAV model and the 1:5000 topographic map of the study area largely supporting this assumption ( Figure 3A). The registration of the smartphone models has been done comparing the orientation and position of photographs (as measured by the smartphone sensors) with the same dataset estimated in the reconstructed SfM-MVS photogrammetric scene, prior to and during an iterative procedure of misaligned camera removal.

Position and Orientation Errors
Both the measured and estimated data are affected by errors that are only partly detectable and distinguishable. Orientation data includes trend and plunge of the camera view direction (ξ) and of the direction parallel to the long axis of the camera (ρ). Since both ξ and ρ are gently plunging (<10 • ), the measurement of their plunge basically relies only on the inclinometer/gyroscope sensors, and errors associated with this kind of measurements are typically <1 • [16], providing the most robust metric of the entire procedure. This is evidenced by values of Psi and Theta (i.e., the angle of rotation for each model to the ground truth around the X and Y axis respectively) that are <1 • for most of the models ( Figure 8B). The measured trend of ξ and ρ is instead provided by the magnetometer and it can be affected by errors of variable magnitude. The need for frequent recalibrations, along with possible sources of local magnetic fields (both internal and external to the smartphone), in our experience as field geologists as well as studies that quantify the effect [34,35] may imply errors up to ten degrees, depending in part on the device. These errors can significantly affect the georeferencing of models registered with the Registered Camera Position method. Instead, for models registered via the Finely Registered Camera Position methods, the occurrence of systematic errors in the measurements provided by the magnetometer does not affect the model registration. Indeed, the final rotation about the Z axis, constrained by the GNSS measured XY coordinates of photos, can essentially cancel out any systematic error in the trend of ξ and ρ. In agreement, our results show that a model's registration based only on orientation data (i.e., RCP) produces unsatisfactory results. Consequently, this method is no longer considered in the following discussion.
Concerning the measured XYZ coordinates of photographs, errors associated with the built-in GNSS smartphone sensor were~4 m as recorded by the AngleCam App, also observed in the field using the GPSTest app. Models translation error ranges from 3 to 5 m in the tested models ( Figure 8A). This value includes both the systematic positioning error of the recorded GNSS locations and the contribution associated with the image misalignment during the SfM scene reconstruction. The minimum translation value is attained in the FRCP sub-models 2, for which the amount of translation decreases from 4 m down to 3 m as ∆λ decreases from 6 • to 1 • . This suggests that 3 m can be taken as an estimation of the systematic GNSS positioning error for the camera locations of Model 2. Similarly, 4-5 m is taken as the systematic positioning error for photographs of Model 1. The rather stable values of the translation components for the FRCP models ( Figure 8A) suggest that these stable values roughly correspond to the systematic error of each component. These components are congruent with the average ∆X, ∆Y, and ∆Z shown in Figure 5B,C, derived by comparing the UAV and smartphone models. In the same figure, the standard deviation of ∆X, ∆Y, and ∆Z is reported, spanning from 1.1 to 1.4 m for ∆X and ∆Y, and from 1.6 to 1.9 m for ∆Z. We take these values as proxies for the random positioning error of each component. Rotation and scaling procedures in the FRCP models are not affected by the systematic positioning error, whereas the random component may significantly impact these two steps. These errors are similar in the two models, though the length of the survey transect for Model 1 is approximately twice that of Model 2 (60 m vs. 30 m). The high values of Phi for the FRCP Model 1 ( Figure 8B) indicates that a random positioning error of 1.5 m for the X and Y component for a 30 m long survey transect (5%) can lead to unsatisfactory registration results. When the random error decreases to about 2.5% of the length of the survey transect, the quality of the registration drastically increases. Notice that the provided random and systematic positioning errors have been derived, whereas the only hard datum is the positioning error provided by the smartphone GNSS sensor, which we estimated to be ∼4 m for both Model 1 and Model 2, corresponding to about 14% and 7% of the survey transect of model 1 and 2, respectively.

Best Practice for Image Acquisition
Results presented here allow the definition of a standard registration procedure using smartphone camera position and orientation data for terrestrial SfM-MVS photogrammetric models. This procedure is aimed at mitigating positioning errors and scene reconstruction artifacts.
It is important to consider that any GNSS systematic positioning error is directly transferred to the model and there is no way of removing it without adding at least one externally referenced ground control point.
The advent of the dual-frequency GNSS chipset for user-grade smartphones has undoubtedly represented a breakthrough in high-precision smartphone positioning. Nevertheless, the state of the art [36,37] recognizes that observation sessions lasting at least 20 ÷ 30 min are needed, for a fast-static surveying mode, to achieve centimeter-level accuracy. Such timing, due to the low quality of the carrier phase smartphone observations to fix integer ambiguity resolution and to stabilize the signals, would require image acquisition sessions of tens of hours, which would nullify most of the advantages of SfM-MVS photogrammetry. Indeed, the acquisition speed adopted here (i.e., one photo every 10 s) has degenerated the positioning to meter-level accuracy. Practically, this speed could be slightly reduced to one photo every one or two minutes, but the need of tens to hundreds of photographs to build a high-resolution model, prevents the adoption of an acquisition speed of one photo every 20 min, which is required to optimize positional accuracy. The second point concerns the correct planning of the acquisition area size. In many cases, a wide scene can be reconstructed using a relatively narrow acquisition area. However, the correct scaling (for both FRCP and RCP methods) and orientation (for the FRCP method) require evaluating the ratio between the GNSS positioning error and the acquisition path length, which should be always <7%. As discussed before, systematic errors in the measurements provided by the GNSS sensor and magnetometer do not affect the correct orientation and scaling of the model. Accordingly, a short acquisition time period is to be preferred and, for the same reason, magnetometer re-calibration during the image acquisition procedure is to be avoided. For the sake of consistency of data capture, including the consistency of magnetometer systematic error, use of different smartphone devices for individual photo-surveys is not recommended. In comparison to the classic similarity transform used for the registration of matched pointsets [14], high collinearity of photos is not detrimental to the proposed procedure. The use of convergent photos to mitigate the doming effect should be a standard procedure for image acquisition. However, the user must be aware that oblique photos are likely to be misaligned during the SfM scene reconstruction, as illustrated in Figure 1, and their use for the registration of the model should be avoided. We have shown that accurate orientation (i.e., sum of the three rotation components <2 • ) and scaling (i.e., scaling error < 3%) can be achieved only using photographs for which the measured and reconstructed orientations differ <2 • . A critical problem, potentially having a major impact on photo survey planning is that the procedure of progressive removal of misaligned images can drastically reduce the number of photos. In this situation, the risk of having only a very limited number of photos available for the model registration, not sufficient to average random positioning errors, is very high. In agreement, the user must plan an acquisition campaign in which the number of photographs is up to one order of magnitude larger than that needed to create the model.

Research Perspectives, Applications and Limits
Removal of the need for ground control points for model registration provides an invaluable improvement to facilitate the SfM-MVS photogrammetric reconstructions of poorly accessible areas. The most striking example of this is the reconstruction of extraterrestrial scenes. Indeed, both position and orientation of images are available for Martian rover cameras [38], making it now virtually possible to produce (extra)terrestrial SfM-MVS photogrammetric DEMs, provided a sufficient overlap between the extraterrestrial images occurs [39]. Additional fields of application include any case in which high accuracy RTK or Differential GNSS antennas cannot be employed, for both logistics and costs, including, for example, the common geological fieldwork, during which the acquisition of a digital outcrop may have not been planned but becomes necessary once in the field. In those cases, the operator has to find the right compromise between acquisition time, scene size, and required accuracy of the model. In fact, the resolution of the model is primarily determined by the ground sampling distance of images, whereas its accuracy depends on the georeferencing procedure. We have seen that using a dual-frequency smartphone, a rapid acquisition (i.e., one photo every 10 s) leads to positioning errors in the order of a few meters. When the acquisition area width is >50 m, such an error poorly affects scaling (the error is in the order of 2%) and orientation (the sum of the three rotation components <2 • ), whereas errors in the positioning of the model equal the positioning error of the smartphone GNSS. In this scenario, orientation data and dimensional parameters of features seen in the models (e.g., fracture spacing, fault orientation, slope angle of river streams, offset of faulted rives) can be successfully measured. As seen in our work, when the acquisition area becomes <50 m wide, a positioning error of a few meters affects also the orientation of the model. In order to overcome this issue and take full advantage of the dual-frequency GNSS smartphone potentialities, the acquisition speed must be reduced. As previously mentioned, at least 20 ÷ 30 min of stabilization are needed to achieve centimeter-level accuracy [36,37], whereas sessions of <10 min allow to achieve tens of cm-level accuracy [37]. Accordingly, for 10 ÷ 50 m wide acquisition areas, an acquisition speed of one photo every 2 ÷ 3 min should result in a positioning accuracy sufficient to scale and orient the model. A 3-axis gimbal stabilizer may greatly help in this case. In all the cases, a practice to be explored, which could drastically reduce positioning errors, is that of including in the datasets just a few photos (oriented perpendicular to the scene, since oblique photos are generally mispositioned and not useful for registration) for which the GNSS signal has been recorded for 10 ÷ 20 min. Acquisition areas <5 m wide become challenging when trying to use the GNSS positioning system of smartphones. In those cases, the XY position of images cannot be used to mitigate the measurement error of the magnetometer. The orientation of the scene can be achieved only by integrating image orientation data and orientation of previously measured features seen in the scene [21]. Similarly, scaling requires including in the scene an object of known dimension, whereas a long-lasting GNSS signal recording can lead to cm-level positioning accuracy.

Conclusions
We have proposed a method for terrestrial structure from motion-multiview stereo (SfM-MVS) photogrammetry which employs smartphone sensors and removes the need for ground control points for model registration. Indeed, position and orientation of photos cannot be directly used to georeference the model. Instead, a recursive procedure of model registration and image deletion must be carried out, until images, whose position in the virtual scene has been incorrectly reconstructed in the SfM process are removed. The remaining photos then can be used to produce a satisfactory georeferentiation of the model, provided the ratio between the GNSS positioning error and the width of the acquisition area is ∼7% or less. The main advantage of this method with respect to previous ones is the extreme ease of use and portability of tools involved in the registration of 3D models.