Smartphone-Based Photogrammetry for the 3D Modeling of a Geomorphological Structure

: The geomatic survey in the speleological ﬁeld is one of the main activities that allows for the adding of both a scientiﬁc and popular value to cave exploration, and it is of fundamental importance for a detailed knowledge of the hypogean cavity. Today, the available instruments, such as laser scanners and metric cameras, allow us to quickly acquire data and obtain accurate three-dimensional models, but they are still expensive, require a careful planning phase of the survey, as well as some operator experience for their management. This work analyzes the performance of a smartphone device for a close-range photogrammetry approach for the extraction of accurate three-dimensional information of an underground cave. The image datasets that were acquired with a high-end smartphone were processed using the Structure from Motion (SfM)-based approach for dense point cloud generation: di ﬀ erent image-matching algorithms implemented in a commercial and an open source software and in a smartphone application were tested. In order to assess the reachable accuracy of the proposed procedure, the achieved results were compared with a reference dense point cloud obtained with a professional camera or a terrestrial laser scanner. The approach has shown a good performance in terms of geometrical accuracies, computational time and applicability.


Introduction
Geomorphological and geological observation in underground environments such as caves could still represent a challenge. These environments are often prohibitive, given the difficulty of access and the conditions, even extreme, of humidity and temperature inside them. The continuous evolution of sensor-based 3D surveying and modeling techniques and the development of increasingly efficient systems for the visualization of digital data highlight the added value given by the use of these methods in the geological field. Indeed, the technological development of surveying instruments allowed passing from subjective descriptions to illustrate the environment [1], influenced by human perception, to full three-dimensional models. However, it is only in the last few years that lighter, cheaper and easier-to-introduce versions of surveying instruments have been sold on the market. The most recent electronic surveying equipment allows one to obtain a complete description of the environment and to represent all those morphological details that characterize the floors, walls, and roofs, which are fundamental for drawing conclusions on the evolution of the underground environments.
To date, beside instruments typical of classical topography, such as Total Stations (TS), both active sensors and passive sensors can be mainly used [2]. Active sensors are now widely used in underground caves to extract 3D information [3][4][5][6][7][8][9][10][11], and include all the scanning devices that emit a signal (in the form of a coded structural light pattern or through electromagnetic radiation in the form of a modulated laser light) and that record information from scene observation. Laser scanners are the most popular specific applications installed on the mobile, now widely available in stores [25][26][27][28][29]. For example, today there are smartphones on the market, such as the Sony Xperia XZ1, sold with an integrated 3D modeling application [30]. In the literature, several studies are approaching 3D reconstruction methodologies involving the use of images acquired by smartphone devices or tablets for different analyses in multiple domains. In Aicardi et al. [24] and Fritsh and Syll [31], first attempts at exploiting the close-range photogrammetric technique using smartphone images were conducted, with the aim of reconstructing morphologically complex architectural elements, both for the documentation and the scientific dissemination of the cultural heritage and for the digital preservation of historical buildings and smaller objects of historical, architectural and artistic relevance. A further advantage derives from the easy availability of these tools, which makes them suitable for use in emergency situations when the goal is to obtain a complete description of the object in the shortest possible time. Since it is not possible to create a full object model with TLS or with photogrammetry, it is in both cases possible to interpolate the collected points in order to have a 3D model, removing the so-called shadows. This aim was pursued by Dabove et al. [32], who demonstrated the effectiveness of close-range photogrammetry for rapid mapping, using images acquired from tablets, in post-earthquake scenarios. Micheletti et al. [33] investigated the potentialities of the acquisition of high-resolution topographic and terrain data using hand-held smartphone technology for the generation of digital terrain models (DTM) of a river bank and an Alpine alluvial fan. These studies highlighted that an image-based approach that exploits mass-market devices allows for the reconstruction of geometrically correct models at different scales with a centimeter accuracy, or even higher. Moreover, the discussed methodology allows one to extract, in addition to the geometric properties of the object of study such as the shape and volume, its radiometric information [34], which characterizes each material; furthermore, this a fundamental aspect to take into account in order to define, for some applications, its properties and values.
The aim of this work is to investigate the potential of a smartphone device in the 3D reconstruction of objects of geomorphological interest and to propose a procedure that can be easily used, even by non-expert operators, based on the principles of photogrammetry. This refers in particular to researchers and experts interested in the geological and geomorphological evaluation of the different conformations present in an underground cave. Before applying this technique in an underground cave, the performances obtainable with a smartphone device are tested in a laboratory environment, in order to verify the precision and accuracy that are obtainable with these low-cost instruments. In addition, it is therefore important to define an operative methodology to be proposed in the field to non-expert users and which allows for a correct acquisition of images for the generation of 3D models; these are fundamental in order to have a good geological, geomorphological and tectonic knowledge of the subsoil, both for purely speleological or geological purposes and for the stability analysis of cavities.
The proposed methodology is designed to allow for acquisitions with easily transportable instruments (commercial cameras, tablets and smartphones) that are also available locally and that are used in limited spaces: often the elements of interest and the areas to be detected are located in areas that are difficult to reach, where it is possible to stop only for a short time, so that it is impossible to operate with LiDAR instruments. Moreover, if on the one hand the methods based on active sensors ensure rapidity in the survey and processing of dense clouds, on the other hand image-based procedures allow for the use of less expensive equipment, resulting in the same final product. The main novelty of this contribution is the use of the computer vision algorithms through the use of images acquired from a smartphone, exploiting both known software solutions and, above all, a smartphone application, which is quite a new approach for this type of application. Only the data from smartphone images were considered, without any integration into the model of other sorts of data (e.g., TLS point). Dense image matching techniques were then used to build a dense 3D point cloud, to be used as a support for geological and geomorphological investigations and any other activity, which could be supported by a dense and high-level-of-detail 3D model. With the aim of also involving non-expert users in the image-processing phase, low cost or open source tools were also tested: the acquired images were processed using some of the best known software tools implementing CV algorithms, which are now also implemented in some smartphone applications (e.g., SCANN3D) [25], in order to evaluate their behavior in the processing and results. The possibility of reducing the costs of the survey with low-cost equipment, such as digital and amateur cameras, smartphones, tablets and action cams, is one of the main advantages of the photogrammetric technique, which also allows for a high repeatability of the survey. In this contribution, we propose a standardized procedure of 3D reconstruction of underground caves, through a photogrammetric survey with a smartphone, which allows us to obtain a high-quality result in terms of geometric accuracy. To define the methodology, we have analyzed the limits of this approach as the distance between the objects and as how image overlapping can influence the processing and the products. To this purpose, preliminary tests have been conducted in a controlled environment, using a rock sample as the object of study. Finally, the proposed approach was tested in a real underground cave.

Case Study
The present research was aimed at the three-dimensional reconstruction of a portion of the Cave of Bossea (CN), Italy. The Cave of Bossea, is considered among the most beautiful and geologically interesting caves in Europe, since it also has an underground karst laboratory managed by the Bossea Scientific Station of Italian Alpine Club (Club Alpino Italiano-CAI) of Cuneo and by the Department of Environmental, Land and Infrastructure Engineering of the Politecnico di Torino (Italy), with the collaboration of the Cuneo Department of Piedmont Regional Agency for the Protection of the Environment (Agenzia Regionale per la Protezione Ambientale-ARPA) and the Radiation Section of the ARPA Valle d'Aosta. This laboratory is of national importance for the study of water circulation in carbonate rocks, of the organization and evolution of karst aquifers, of genetic and lithogenic speleological processes, of atmospheric constituents, of the microclimate, and of the energy balance of the underground environment [35].
The preliminary test related to the proposed methodology was performed at the Geomatics Laboratory of DIATI (Department of Environmental Engineering, Land and Infrastructure) of the Politecnico di Torino, using a sample of calcareous material extracted from the Cave of Bossea ( Figure 1).
(e.g., SCANN3D) [25], in order to evaluate their behavior in the processing and results. The possibility of reducing the costs of the survey with low-cost equipment, such as digital and amateur cameras, smartphones, tablets and action cams, is one of the main advantages of the photogrammetric technique, which also allows for a high repeatability of the survey. In this contribution, we propose a standardized procedure of 3D reconstruction of underground caves, through a photogrammetric survey with a smartphone, which allows us to obtain a high-quality result in terms of geometric accuracy. To define the methodology, we have analyzed the limits of this approach as the distance between the objects and as how image overlapping can influence the processing and the products. To this purpose, preliminary tests have been conducted in a controlled environment, using a rock sample as the object of study. Finally, the proposed approach was tested in a real underground cave.

Case Study
The present research was aimed at the three-dimensional reconstruction of a portion of the Cave of Bossea (CN), Italy. The Cave of Bossea, is considered among the most beautiful and geologically interesting caves in Europe, since it also has an underground karst laboratory managed by the Bossea Scientific Station of Italian Alpine Club (Club Alpino Italiano-CAI) of Cuneo and by the Department of Environmental, Land and Infrastructure Engineering of the Politecnico di Torino (Italy), with the collaboration of the Cuneo Department of Piedmont Regional Agency for the Protection of the Environment (Agenzia Regionale per la Protezione Ambientale-ARPA) and the Radiation Section of the ARPA Valle d'Aosta. This laboratory is of national importance for the study of water circulation in carbonate rocks, of the organization and evolution of karst aquifers, of genetic and lithogenic speleological processes, of atmospheric constituents, of the microclimate, and of the energy balance of the underground environment [35].
The preliminary test related to the proposed methodology was performed at the Geomatics Laboratory of DIATI (Department of Environmental Engineering, Land and Infrastructure) of the Politecnico di Torino, using a sample of calcareous material extracted from the Cave of Bossea ( Figure  1).
The sample measurements were about 60 × 7 × 3 cm and aroused particular interest for its conformation in the study of water circulation in carbonate rocks. Moreover, it presents clear radiometric variations due to the characteristic components of the water that has been deposited over time and which represent the main study aspect of the object. The test was performed in optimal light and environmental conditions, in order to evaluate the potentialities of the procedures: this means that all images were collected in a controlled environment, both in terms of environmental (stable temperature of 25 °C and constant humidity of about 55%) and lighting conditions: the sample was settled on a non-reflective table and illuminated with professional lighting. Subsequently, the work aimed to stress the methodology in a real underground cave environment, with variable levels of illumination and critical situations of humidity or temperature. To this purpose, three walls inside the cave were selected ( Figure 2 and Table 1). The three chosen surfaces, characterized by the typical irregularity of these environments, are distinguished by the different lighting, punctual in the first case, scarce in the second and diffused in the last one. The sample measurements were about 60 × 7 × 3 cm and aroused particular interest for its conformation in the study of water circulation in carbonate rocks. Moreover, it presents clear radiometric variations due to the characteristic components of the water that has been deposited over time and which represent the main study aspect of the object. The test was performed in optimal light and environmental conditions, in order to evaluate the potentialities of the procedures: this means that all images were collected in a controlled environment, both in terms of environmental (stable temperature of 25 • C and constant humidity of about 55%) and lighting conditions: the sample was settled on a non-reflective table and illuminated with professional lighting.
Subsequently, the work aimed to stress the methodology in a real underground cave environment, with variable levels of illumination and critical situations of humidity or temperature. To this purpose, three walls inside the cave were selected ( Figure 2 and Table 1). The three chosen surfaces, characterized by the typical irregularity of these environments, are distinguished by the different lighting, punctual in the first case, scarce in the second and diffused in the last one. Furthermore, due to a parapet that limits movement for safety reasons, it is not possible to get close enough to the various areas in order to inspect the various inlets with more detail. Moreover, most of the surfaces are moistened by a thin layer of water, which increases considerably in the most humid periods, and which makes them very reflective.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 21 Furthermore, due to a parapet that limits movement for safety reasons, it is not possible to get close enough to the various areas in order to inspect the various inlets with more detail. Moreover, most of the surfaces are moistened by a thin layer of water, which increases considerably in the most humid periods, and which makes them very reflective.

D Model Generation: Methodology Assessment in a Controlled Environment
The photogrammetric technique allows us to determine the shape, size and position of an object, using as the main data the information contained in the images representing the object itself.
In the present work, a test is presented and performed with the purpose of analyzing the possibility of acquiring images with a common smartphone and using them for 3D modelling to obtain a detailed description of a geomorphological object and to evaluate the possibility of extending the procedure to a part of an underground cave.

Imaging Sensor Calibration
To solve the photogrammetric problem and extract metric information from the images, it is necessary to previously estimate the internal orientation parameters of the camera through a calibration procedure. The internal orientation consists on the estimation of the position of the projection center with respect to the fiducial system on the image plane [17], and it is defined by the coordinates of the principal point (x0, y0) and by the focal distance (f).
These parameters are referred to an ideal acquisition geometry, but in the case of using a nonmetric camera, such as those integrated in smartphone devices, it is fundamental to take into account the distortion of the physical geometry of the lenses (radial and tangential distortions) and the real geometry of the sensor of the camera (skew). Therefore, in order to correctly estimate the internal orientation, it is necessary to know the magnitude of the distortions resulting from the optical system and the consequent deformations on the image.

D Model Generation: Methodology Assessment in a Controlled Environment
The photogrammetric technique allows us to determine the shape, size and position of an object, using as the main data the information contained in the images representing the object itself.
In the present work, a test is presented and performed with the purpose of analyzing the possibility of acquiring images with a common smartphone and using them for 3D modelling to obtain a detailed description of a geomorphological object and to evaluate the possibility of extending the procedure to a part of an underground cave.

Imaging Sensor Calibration
To solve the photogrammetric problem and extract metric information from the images, it is necessary to previously estimate the internal orientation parameters of the camera through a calibration procedure. The internal orientation consists on the estimation of the position of the projection center with respect to the fiducial system on the image plane [17], and it is defined by the coordinates of the principal point (x 0 , y 0 ) and by the focal distance (f ).
These parameters are referred to an ideal acquisition geometry, but in the case of using a non-metric camera, such as those integrated in smartphone devices, it is fundamental to take into account the distortion of the physical geometry of the lenses (radial and tangential distortions) and the real geometry of the sensor of the camera (skew). Therefore, in order to correctly estimate the internal orientation, it is necessary to know the magnitude of the distortions resulting from the optical system and the consequent deformations on the image.
Since the aim of this study is to verify the performances of popular mass-market devices in a survey of geomorphological structures of underground caves, a medium range smartphone has been considered: a OnePlus 5, whose characteristics are described in Table 2. Generally, before using images for 3D reconstruction, it is important to analyze the intrinsic characteristics of the digital camera, which means knowing the parameters with which the images were taken, in order to study the acquisition geometry of the system.
The consequence of the diffusion of these devices has led, over the years, to the development of a series of algorithms and procedures for the reconstruction of the internal orientation parameters [17,36]. In this case study, we adopted the strategy of the test field calibration [37], which is based on the use of a special gridded panel with well-known coordinates or distances ( Figure 3). Generally, before using images for 3D reconstruction, it is important to analyze the intrinsic characteristics of the digital camera, which means knowing the parameters with which the images were taken, in order to study the acquisition geometry of the system.

OnePlus 5 Sensor
The consequence of the diffusion of these devices has led, over the years, to the development of a series of algorithms and procedures for the reconstruction of the internal orientation parameters [17,36]. In this case study, we adopted the strategy of the test field calibration [37], which is based on the use of a special gridded panel with well-known coordinates or distances ( Figure 3).
To perform the estimation, the free software Octave [38] was used; in particular, an adaptation of the open source Camera Calibration Toolbox for Matlab [39] allows us to obtain the calibration parameters from images of a checkerboard panel taken from the camera.
This panel has to be acquired from different positions, ensuring a good intersection of the rays and covering the entire sensor format. The images must be captured with the optical axis being perpendicular to the panel or convergent, and each image should have a relative rotation of 90° around the optical axis. In our case, the checkerboard of 1 × 0.7 m was realized on a wooden panel to limit its deformations. The procedure then suggests carrying out a series of photos of the panel from different angles, which can then be inserted in the toolbox. To perform the estimation, the free software Octave [38] was used; in particular, an adaptation of the open source Camera Calibration Toolbox for Matlab [39] allows us to obtain the calibration parameters from images of a checkerboard panel taken from the camera.
This panel has to be acquired from different positions, ensuring a good intersection of the rays and covering the entire sensor format. The images must be captured with the optical axis being perpendicular to the panel or convergent, and each image should have a relative rotation of 90 • around the optical axis.
In our case, the checkerboard of 1 × 0.7 m was realized on a wooden panel to limit its deformations. The procedure then suggests carrying out a series of photos of the panel from different angles, which can then be inserted in the toolbox.
The calibration procedures must be conducted according to the sensor configurations (type of data acquired and resolution), which have been set during the survey, to calculate coherent intrinsic parameters. Therefore, the OnePlus 5 was calibrated using the frame extracted from the acquired videos. Indeed, the survey campaign was realized by employing the smartphone in the video mode in order to ensure a complete, fast and stabilized data acquisition and in order to avoid the blur effect. Moreover, it would then be possible to exploit the same videos for further aims, such as the smartphone navigation of indoor environments [40]. 20 frames selected from the recorded video were uploaded on the application, which made it possible to identify the grid corners by knowing their size (10 cm in our checkerboard panel) and to extract the camera parameters (Table 3).

Data Acquisition
The performed test addressed the assessment of the potentialities of a manual data acquisition for the updating of an object or a small area. The survey campaign was realized by employing the OnePlus 5 smartphone device in the video mode, in order to ensure a stabilized acquisition and to avoid the blur effect. At the same time, some images were captured with a NIKON D800E camera (Table 4), a high quality DSLR. Indeed, thanks to its performances from a technological point of view, it was employed for the generation of a reference dataset for the assessment of the geometric quality of the 3D point clouds produced with the smartphone. The acquisition of the frames must be carefully designed and must take into account the distance between the different acquisition points, defined according to the scale of representation, the scale of the frame and the resolution of the image. In general, the acquisition geometry must be constructed in such a way that between one image and the next there is sufficient overlap both in the longitudinal and in the transversal directions, which for an approach with techniques deriving from Structure from Motion algorithms reaches values of 60-75% [41].
The smartphone device was held by the operator, who was moving around the object, trying to have a complete description: the auto-focus functionality of the camera sensors was disabled. In a few minutes it was possible to collect a dataset composed of more than 5000 frames, but only some of them were employed for the 3D reconstruction ( Figure 4). The smartphone device was held by the operator, who was moving around the object, trying to have a complete description: the auto-focus functionality of the camera sensors was disabled. In a few minutes it was possible to collect a dataset composed of more than 5000 frames, but only some of them were employed for the 3D reconstruction ( Figure 4).

Data Processing
In order to evaluate the effectiveness of the smartphone-based methodology for the 3D reconstruction of geomorphological objects, the frames extracted from the video acquisition were processed and aligned through different software packages. To this purpose, two of the most popular pieces of software for image data processing were selected: the commercial solution Agisoft Metashape Professional (AMP) [42] and the open source VisualSfM (VSFM) [43]. Moreover, in order to evaluate the possibility of generating a point cloud in almost real-time and on the acquisition site, an application running on Android, SCANN3D, was tested. SCANN3D locally performs the 3D reconstruction, not on the cloud, in quite a rapid way, working entirely offline. The main benefit of the application is the real-time guidance during the image capturing phase, tracking multiple points in the viewpoints, which go from red to green when the camera has been moved sufficiently, ensuring a good image overlap. However, this application does not lock the focus and exposure after acquiring the first image, and this aspect can negatively affect the 3D reconstruction. In order to obtain a result comparable to that of the other software, it was decided to import and process the same image dataset.
These pieces of software implement different algorithms deriving from the Computer Vision (CV) for image processing and matching, allowing, almost automatically, for the easy elaboration of a large number of images. These image-based reconstruction methods are generally based on the Structure from Motion (SfM) [44] approach and the Dense Image Matching (DIM) or Multi View Stereo (MVS) algorithms [45] and are characterized by five main steps, which require, as their input data, only the acquired images: -Feature extraction and identification of the matching points between images through different algorithms, such as the Scale Invariant Feature Transform (SIFT) algorithm [46], or its modified version [47];

Data Processing
In order to evaluate the effectiveness of the smartphone-based methodology for the 3D reconstruction of geomorphological objects, the frames extracted from the video acquisition were processed and aligned through different software packages. To this purpose, two of the most popular pieces of software for image data processing were selected: the commercial solution Agisoft Metashape Professional (AMP) [42] and the open source VisualSfM (VSFM) [43]. Moreover, in order to evaluate the possibility of generating a point cloud in almost real-time and on the acquisition site, an application running on Android, SCANN3D, was tested. SCANN3D locally performs the 3D reconstruction, not on the cloud, in quite a rapid way, working entirely offline. The main benefit of the application is the real-time guidance during the image capturing phase, tracking multiple points in the viewpoints, which go from red to green when the camera has been moved sufficiently, ensuring a good image overlap. However, this application does not lock the focus and exposure after acquiring the first image, and this aspect can negatively affect the 3D reconstruction. In order to obtain a result comparable to that of the other software, it was decided to import and process the same image dataset.
These pieces of software implement different algorithms deriving from the Computer Vision (CV) for image processing and matching, allowing, almost automatically, for the easy elaboration of a large number of images. These image-based reconstruction methods are generally based on the Structure from Motion (SfM) [44] approach and the Dense Image Matching (DIM) or Multi View Stereo (MVS) algorithms [45] and are characterized by five main steps, which require, as their input data, only the acquired images: -Feature extraction and identification of the matching points between images through different algorithms, such as the Scale Invariant Feature Transform (SIFT) algorithm [46], or its modified version [47]; -Outliers filtering, exploiting the Random Sample Consensus (RANSAC) algorithm [48], and a robust estimation of the acquisition geometry through linear models that impose geometric constraints; -Estimate of the 3D coordinates of the object, the parameters of the internal and external orientation and the relative statistical information concerning the accuracy of the calculation, through the Bundle-Adjustment technique [49]; -Dense surface reconstruction of the point cloud identified for the orientation, exploiting several algorithms, including the Exact, Smooth, Height-field and Fast methods, for the object surface description and its main discontinuities [45]; -Dense point cloud interpolation and texturing in order to obtain a photorealistic visualization.
The used pieces of software differ in terms of the algorithms involved in the image processing and 3D reconstruction pipeline; however, they all lead to the generation of the dense point cloud and, except for VS, to the production of a textured surface model. The georeferencing of the model can be performed during the image processing phase by importing, together with the images, the EXIF files containing the cameras acquisition points; otherwise, it is possible to manually identify some common points between the images and to assign them their coordinates in the chosen reference system (e.g., topographically measured markers).
A first test was conducted using the advanced image-based software package AMP, which today is considered one of the best solutions for the generation of an accurate and complete dense point cloud in a very simple way. However, AMP can be defined as a "black box" since the implemented algorithms are not described in a deeper way.
Subsequently, we tested the performance of the open source software vs. the integrated algorithm CMVS/PMVS [50]. Naturally, all the processing steps of this tool for image elaboration are well known, thanks to the dissemination of the information provided by the developers and the scientific community. Finally, the same datasets processed by the previously mentioned software were elaborated with the SCANN3D application, in order to evaluate its performances.
Assuming that the conditions of particular environments require a certain speed in the acquisition of data, we wanted to assess what was the minimum number of images necessary to ensure a complete description of the object. To this end, three different datasets have been created with the frames extracted from the video, containing respectively 50, 25 and 12 images, in each of which an overlap between the images greater than 70% is guaranteed.
Naturally, in order to obtain comparable results in AMP and SCANN3D, the High and Ultra settings, respectively, were employed, while in VSFM it is not possible to select such a setting.
Data elaborations and analyses were conducted using a computer Hp EliteBook, mounting a processor Intel ® Core™ i7-4600U CPU 2.1 GHz-2.7 GHz, with Ram of 16 GB and an architecture of 64 bit; to which all of the processing times refer. Table 5 shows some information regarding the dense point clouds computation, which demonstrates how results can vary from one software to another. From an initial analysis of these data, it is clear that the AMP software is able to generate products that are much denser than for other software, although it requires a greater computational time. It is also possible to observe that, while AMP was able to reconstruct the photogrammetric model with both of the three datasets, on the other hand twelve images were not sufficient for the VSFM software to extract 3D information from the object. Furthermore, with the SCANN3D application, it was not possible to reconstruct the photogrammetric model with the dataset consisting of 50 images, probably because the processor implemented on the smartphone did not have a sufficient performance to face a similar computational cost.

Reference Data
The validation of the resulting products was conducted through the comparison of each point cloud with the generated reference point cloud, starting from frames acquired by the NIKON D800E camera. In order to compare the models in the same local reference system, ten natural points on the rock sample were topographically measured with a total station and used for each dataset orientation through a bundle block adjustment. All measurements were adjusted with the MicroSurvey StarNet v.7.0 software in order to obtain the final coordinates, which were determined with a millimeter accuracy. Six points were used as control points for the point clouds georeferentiation, while four other points were employed as check points for the residuals evaluation.
The 33 images were processed with AMP, obtaining an estimate root mean square (RMSE) of less than 0.5 mm on the check points and a point cloud composed of more than two million points ( Figure 5).

Reference Data
The validation of the resulting products was conducted through the comparison of each point cloud with the generated reference point cloud, starting from frames acquired by the NIKON D800E camera. In order to compare the models in the same local reference system, ten natural points on the rock sample were topographically measured with a total station and used for each dataset orientation through a bundle block adjustment. All measurements were adjusted with the MicroSurvey StarNet v.7.0 software in order to obtain the final coordinates, which were determined with a millimeter accuracy. Six points were used as control points for the point clouds georeferentiation, while four other points were employed as check points for the residuals evaluation.
The 33 images were processed with AMP, obtaining an estimate root mean square (RMSE) of less than 0.5 mm on the check points and a point cloud composed of more than two million points ( Figure 5).  Table 6 summarizes the characteristics related to the reference dense point cloud. In order to further evaluate the accuracy of this reference model and, specifically, its scale, we used three points that were external to the rock sample, measured with the total station. The distances between these points have been computed and, the same distances, were measured on the reference 3D dense point cloud. The difference between these distances are less than 0.3 mm and confirm the accuracy of the model, considering that the average GSD corresponds to 0.032 mm/pixels.

Results and Comparisons
A first statistical analysis of the point clouds obtained with a low-cost tool was carried out by analyzing the estimated residuals on the control points and on the check points (Table 7). Indeed, for the orientation of each dataset through a bundle block adjustment, the same six control points measured for the reference point cloud were employed, while the remaining four were used as check points. The obtained values are indicative of the overall geometric accuracy of the photogrammetric models generated through the SFM approach. However, this type of analysis can only be carried out  Table 6 summarizes the characteristics related to the reference dense point cloud. In order to further evaluate the accuracy of this reference model and, specifically, its scale, we used three points that were external to the rock sample, measured with the total station. The distances between these points have been computed and, the same distances, were measured on the reference 3D dense point cloud. The difference between these distances are less than 0.3 mm and confirm the accuracy of the model, considering that the average GSD corresponds to 0.032 mm/pixels.

Results and Comparisons
A first statistical analysis of the point clouds obtained with a low-cost tool was carried out by analyzing the estimated residuals on the control points and on the check points (Table 7). Indeed, for the orientation of each dataset through a bundle block adjustment, the same six control points measured for the reference point cloud were employed, while the remaining four were used as check points. The obtained values are indicative of the overall geometric accuracy of the photogrammetric models generated through the SFM approach. However, this type of analysis can only be carried out on data processed with AMP and VSFM, as SCANN3D does not allow geo-referencing the model using control points, as the entire modeling process is automatic. According to the obtained results (Table 7), for this type of object, increasing the number of images does not necessarily increase the quality of the final product. In fact, the constant radiometry could not allow for a correct alignment of the images; moreover, an excessive overlapping of the images, due to the limited dimensions of the object, could lead to a relatively inaccurate solution.
Some first observations of the photogrammetric models show that those generated through the AMP and VSFM software describe the object in its entirety, even if there is a decrease in the density of the points at the edge in the elevation of the rock sample, due to a non-optimal acquisition geometry.
In contrast, with the same image datasets, the SCANN3D application generates deformed models of the rock sample ( Figure 6). In order to better assess the deviation of these products from the real object, the reference point cloud, realized as described in the previous paragraph, has been exploited to perform some statistical analyses on the models obtained from images acquired with the smartphone.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 21 on data processed with AMP and VSFM, as SCANN3D does not allow geo-referencing the model using control points, as the entire modeling process is automatic.  (Table 7), for this type of object, increasing the number of images does not necessarily increase the quality of the final product. In fact, the constant radiometry could not allow for a correct alignment of the images; moreover, an excessive overlapping of the images, due to the limited dimensions of the object, could lead to a relatively inaccurate solution.
Some first observations of the photogrammetric models show that those generated through the AMP and VSFM software describe the object in its entirety, even if there is a decrease in the density of the points at the edge in the elevation of the rock sample, due to a non-optimal acquisition geometry.
In contrast, with the same image datasets, the SCANN3D application generates deformed models of the rock sample ( Figure 6). In order to better assess the deviation of these products from the real object, the reference point cloud, realized as described in the previous paragraph, has been exploited to perform some statistical analyses on the models obtained from images acquired with the smartphone.

C2C Distance Comparison
In order to assess, with smartphones, the quality and correspondence to reality of the SFM approach, the distance between the clouds was generated and the reference cloud was computed. To this end, the CloudCompare [52] open source software was used, allowing this estimate to be performed using the distance Cloud-to-Cloud (C2C) tool. The C2C tool exploits the Nearest Neighbor algorithm to compute the Euclidean distance between each point of the compared cloud and the nearest point of the reference cloud.
At this point, it was necessary to scale and register the models generated by SCANN3D in a reference system consistent with the other models. In this regard, CloudCompare offers several tools: a rough registration can be performed through pairs of equivalent points identified in the point cloud that is to be registered and in the reference model; subsequently, the automatic method based on the

C2C Distance Comparison
In order to assess, with smartphones, the quality and correspondence to reality of the SFM approach, the distance between the clouds was generated and the reference cloud was computed. To this end, the CloudCompare [51] open source software was used, allowing this estimate to be performed using the distance Cloud-to-Cloud (C2C) tool. The C2C tool exploits the Nearest Neighbor algorithm to compute the Euclidean distance between each point of the compared cloud and the nearest point of the reference cloud.
At this point, it was necessary to scale and register the models generated by SCANN3D in a reference system consistent with the other models. In this regard, CloudCompare offers several tools: a rough registration can be performed through pairs of equivalent points identified in the point cloud that is to be registered and in the reference model; subsequently, the automatic method based on the Iterative Closest Point (ICP) algorithm [52] allows the two models to be very finely recorded. This procedure was performed for both models deriving from SCANN3D, with a final RMSE of about 5 mm.
Since all of the point clouds were quite noisy in the area related to the supporting surface of the rock sample, the cloud portions that were not part of the study object were manually removed. The C2C tool was launched by setting the points that were more than 5 mm away from the reference model as outliers, since the errors that interest us are lower than this limit. Figure 7 shows some significant results deriving from this procedure in histograms, statistical distributions and through the display in a color scale directly on the point clouds. The computation of the differences (Figure 7) confirmed the metrical accuracy of the models estimated by AMP and VSFM, and showed that the greater discrepancies correspond to the support surface and are greater than 5 mm. This error can be attributed to the noise affecting the point clouds, due to the reflective support surface combined with an external light source close to the test site, which probably caused an incorrect identification of the matching points. Table 8 shows the statistical parameters derived from the analysis with the C2C tool, such as the maximal distance, average distance and standard deviation. From the analysis of these results, it emerges that the three-dimensional models obtained with AMP and VSFM have better statistical values, lower ones, or ones on the order of a millimeter. Furthermore, it must be taken into account that for the model obtained from SCANN3D by processing the dataset of 12 images, although the average distance from the reference model is less than 2 mm, the representation of the object is visibly incomplete, as shown in Figure 7. For a better understanding of these discrepancies, a further analysis on the error distributions is required, as reported in Table 9. The results that are obtained show that more than 84% of the point clouds realized with AMP and VSFM deviate from the reference data by less than 2 mm, while these results were not achieved by the SCANN3D application.
The comparison of the point clouds made with images acquired from the smartphone and the model generated from the NIKON data shows that the variations are minimal and that the models are almost the same. The only exception results from the model created through the SCANN3D app; in this case, the incompleteness of the model is not due to the performances of the smartphone camera but, rather, to the algorithms exploited by the application itself and to the computational performances of the smartphone. Iterative Closest Point (ICP) algorithm [53] allows the two models to be very finely recorded. This procedure was performed for both models deriving from SCANN3D, with a final RMSE of about 5 mm.
Since all of the point clouds were quite noisy in the area related to the supporting surface of the rock sample, the cloud portions that were not part of the study object were manually removed. The C2C tool was launched by setting the points that were more than 5 mm away from the reference model as outliers, since the errors that interest us are lower than this limit. Figure 7 shows some significant results deriving from this procedure in histograms, statistical distributions and through the display in a color scale directly on the point clouds. The computation of the differences (Figure 7) confirmed the metrical accuracy of the models estimated by AMP and VSFM, and showed that the greater discrepancies correspond to the support surface and are greater than 5 mm. This error can be attributed to the noise affecting the point clouds, due to the reflective support surface combined with an external light source close to the test site, which probably caused an incorrect identification of the matching points. Table 8 shows the statistical parameters derived from the analysis with the C2C tool, such as the maximal distance, average distance and standard deviation.

Estimate of the Size of the Pixel on the Rock Sample
As demonstrated, the procedure described so far is easily feasible, even by non-experienced operators, thanks to the enormous potential of modern software. However, depending on the objective of the survey, it is of primary importance to establish, as the first key step, the final definition of the product to be obtained, namely the representation scale. In the context of this research work, it is fundamental to obtain 3D representations of objects at large or very large scales, in order to be able to conduct specific geomorphological analyses. These particular objects, as well as some areas of the underground caves, cannot always be inspected or, for different reasons, a prolonged on-site observation over time is not allowed. It is therefore essential to set the photogrammetric survey in such a way as to obtain the desired detail or, on the contrary, according to the operating conditions of the relevant site, in order to assess which resolution can be reached by the instruments that are available to us.
To this purpose, it is necessary to estimate which is the so called Ground Sample Distance (GSD) [17] achievable by the camera of our smartphone device, namely the distance between the center of two pixels expressed in the unit of measurement of the object, as a function of the distance acquisition. GSD was introduced in the field of aerial photogrammetry as fundamental information for flight planning [53]. However, by defining a target GSD for a specific project of terrestrial photogrammetry, it is possible to calculate the most suitable acquisition distance with respect to the object of study to be maintained. Table 10 lists the characteristics of the processed data as shown in the APM report; these are average values calculated with respect to the images of the entire dataset. The GSD is a function of the acquisition distance, focal length and digital sensor size, as explained by the following formula: where d is the image acquisition distance, p is the pixel pitch and f is the focal length.
The GSD values can be estimated by taking into account the characteristics of the OnePlus 5 smartphone and the focal length calculated in previous sections, in order to define the model resolution that can be reached with this specific device. Since the relationship between the GSD and the acquisition distance is directly proportional, it is easy to deduce that, if with an acquisition distance of 37.7 cm we have an average GSD of 0.0920 mm/px, at a distance of 100 cm the GSD will reach a size of 0.2440 mm/px, and at 200 cm it will increase to 0.4881 mm/px. However, it is important to mention that, for rather compact objects for which large-scale representations are desired, it is possible with such limited acquisition distances to have significant differences between the nearest parts of the object and the parts that are more distant from the camera. During the acquisition planning phase, it is therefore recommended that one specify both a target GSD and a minimum GSD to allow some flexibility because of object variation.

A Real Case Study: The Bossea Cave
As the procedure described above has given promising results, the same methodology has been exploited for the 3D modeling of three surfaces located inside a real cave: the Bossea Cave. Furthermore, in this case, the OnePlus 5 smartphone was used to capture several videos (about one minute for each area). All the frames, with a resolution equal to 3840 × 2160 pixel, were extracted, and many of them were selected to be further processed for the 3D model reconstruction. During the acquisition of the videos, no flash or other additional lights were used, other than those already present in the cave, since, although the lighting conditions were not optimal, we wanted to avoid increasing the reflectivity of the wet surfaces. As we were not able to cross the parapet (for safety reasons), we tried to acquire as much information as possible by moving the smartphone to different heights and varying the acquisition angle.
Starting from the videos, a limited number of frames have been selected in order to guarantee sufficient overlap between them.

Data Processing
Subsequently, the image processing was conducted, exploiting the software AMP, as it combines a good accuracy of the result with the possibility of controlling the various process phases and a high density of the final 3D data. In order to guarantee a certain speed of data processing and 3D reconstruction, without renouncing the completeness of the description of the object, it was decided to use a Medium quality setting for the image alignment and a High quality setting for the point cloud densification. Figure 8 and Table 11 show the characteristics of the resulting 3D dense point clouds reconstruction.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 21 Subsequently, the image processing was conducted, exploiting the software AMP, as it combines a good accuracy of the result with the possibility of controlling the various process phases and a high density of the final 3D data. In order to guarantee a certain speed of data processing and 3D reconstruction, without renouncing the completeness of the description of the object, it was decided to use a Medium quality setting for the image alignment and a High quality setting for the point cloud densification. Figure 8 and Table 11 show the characteristics of the resulting 3D dense point clouds reconstruction.  The same datasets were subsequently processed by the SCANN3D application. However, likely due to the higher number of images and to the particularly poor light conditions, the application was able to reconstruct the models only partially, with a quality that was not excellent in terms of the number of points and completeness of the model, even when setting the highest quality level. Nevertheless, it should be pointed out that SCANN3D is not an application created for scientific purposes and requires specific environmental conditions and acquisition strategies in order to be able to correctly model an object through images.

Analysis of the Results
The geometric quality of the entire point cloud was estimated using as reference data a terrestrial laser scanner (TLS) survey of the entire cave, which was realized for a previous project.
The used instrument was the FARO Focus 3D. It is a well-known phase-shift laser scanner with a distance accuracy of up to ±0.002 m and a range from 0.6 m up to 120 m. For the registration of the LiDAR point clouds, some markers, spread over the area, were measured with topographic instruments. As usual, first a reference network was realized with Global Navigation Satellite System (GNSS) techniques to know high-precision vertices. The survey of the two vertices was made through a GNSS double frequency and multi-constellation receiver in static modality standing on each point for about 60 min. Afterwards, the total station was employed to realize a network of points and to measure the position of the markers. All measurements were subsequently adjusted with the MicroSurvey StarNet v.7.0 software, in order to obtain the final coordinates, which were determined with a millimeter accuracy. The final model was obtained with the FARO Scene software, with a  The same datasets were subsequently processed by the SCANN3D application. However, likely due to the higher number of images and to the particularly poor light conditions, the application was able to reconstruct the models only partially, with a quality that was not excellent in terms of the number of points and completeness of the model, even when setting the highest quality level. Nevertheless, it should be pointed out that SCANN3D is not an application created for scientific purposes and requires specific environmental conditions and acquisition strategies in order to be able to correctly model an object through images.

Analysis of the Results
The geometric quality of the entire point cloud was estimated using as reference data a terrestrial laser scanner (TLS) survey of the entire cave, which was realized for a previous project.
The used instrument was the FARO Focus 3D. It is a well-known phase-shift laser scanner with a distance accuracy of up to ±0.002 m and a range from 0.6 m up to 120 m. For the registration of the LiDAR point clouds, some markers, spread over the area, were measured with topographic instruments. As usual, first a reference network was realized with Global Navigation Satellite System (GNSS) techniques to know high-precision vertices. The survey of the two vertices was made through a GNSS double frequency and multi-constellation receiver in static modality standing on each point for about 60 min. Afterwards, the total station was employed to realize a network of points and to measure the position of the markers. All measurements were subsequently adjusted with the MicroSurvey StarNet v.7.0 software, in order to obtain the final coordinates, which were determined with a millimeter accuracy. The final model was obtained with the FARO Scene software, with a registration error of about 1 cm, compatible with the accuracy of the topographic network and the instrument itself.
The photogrammetric models were subsequently compared with the LiDAR point cloud using the software CloudCompare. To this purpose, it was necessary to scale the model and rototranslate it, so that it was placed in a reference system compatible with that of the reference model. Therefore, the ICP algorithm, implemented in the CloudCompare software, was used again. Subsequently, it was possible to estimate the Euclidean distances between the points of the two representations. Figure 9 and Table 12 show some significant results deriving from this procedure in histograms and in statistical distributions. The computation of the differences confirmed the centimetric accuracy of the models generated by AMP, and showed that the surfaces have larger discrepancies where there are holes in the LiDAR point cloud. Other errors can be attributed to the noise affecting the point clouds, due to the low light and large shaded areas, as well as to the reflective surfaces resulting from high humidity. In order to improve the process of bundle block adjustment and facilitate the correct alignment of the images, some topographic measured markers should be used. Furthermore, some of the same markers could be employed as check points for a further evaluation of the accuracy of the dense reconstruction.

Conclusions
The aim of this work was to analyze and propose a low-cost technology and methodology in order to realize 3D representations of objects of geomorphological interest. The aspects examined concern the entire process, which, from the acquisition phase, leads to the analysis of the final result accuracy. The issue to be tackled was to find solutions able to guarantee a certain accuracy of the data

Conclusions
The aim of this work was to analyze and propose a low-cost technology and methodology in order to realize 3D representations of objects of geomorphological interest. The aspects examined concern the entire process, which, from the acquisition phase, leads to the analysis of the final result accuracy. The issue to be tackled was to find solutions able to guarantee a certain accuracy of the data but at the same time able to fulfill the requirements of portability, low-cost, speed in data acquisition and flexibility in order to adapt their use in complex environments. Nowadays, smartphone devices are accessible to everyone in the mass-market, and although they have not been created for metric purposes they solve some problems related to the acquisition of 3D data.
However, due to these devices being generally realized for amateur purposes, it has become crucial to study their characteristics, analyze their behavior and test performances in depth, before using them in surveying operations. The calibration problem was one of the addressed topics in this research, because in photogrammetry a good accuracy of the final model cannot ignore the correct calculation of the internal parameters of the camera.
Attention was dedicated to the planning of the acquisition procedures, in order to guarantee the capture of a limited number of images for the three-dimensional reconstruction, sufficiently overlapping between them. Moreover, it has been demonstrated how the acquisition geometry influences not only the accuracy but also the resolution of the 3D model, through the estimation of the GSD.
In particular, the proposed methodology was, first, assessed in a controlled environment, on a single small object (large scale) and, subsequently, tested on real cave walls (small scale). Many tests were performed in order to identify the limits and potentialities of the proposed solution in the generation of three-dimensional models of a rock sample, combining the use of different image datasets with the potentialities of the Structure from Motion approach, implemented in many pieces of software, both commercial (AMP) and open source (VSFM), and in a smartphone application (SCANN3D). The used software tools offer ease of use and automation in the data processing and are able to realize suitable products in quite a similar way, although there are some slight differences in the results, including in the free and open source software. The potentialities of the proposed approach were evaluated by analyzing the RMSEs with respect to topographically measured control points and check points and by comparing the dense point clouds with the reference model obtained from a professional camera. The obtained results demonstrated that the behavior of the different software tools in terms of performance is quite different. On the one hand, the data processed with software running on computer devices allow us to reach accuracies of about 1 mm on control points, which was also confirmed by the comparison with the reference point cloud; this makes them suitable models to be exploited for scientific analyses. On the other hand, it was proven that the algorithms exploited by the SCANN3D application must be optimized in order to be able to locally generate a 3D model on a smartphone platform. However, it is clear that "mobile" photogrammetry strongly depends on the computational power of the devices, and, above all, we still need hardware that is more powerful than that of the current generation of high-end smartphones, if the goal is to generate 3D models in real-time.
Given the good results of this first attempt to use smartphones for the generation of 3D models of objects of geomorphological interest, some preliminary tests were conducted in order to investigate how the proposed methodology behaves in different environmental conditions, such as in a real underground cave, in which the various operating conditions could affect the results: -the lighting conditions: poor lighting conditions would require a higher ISO and a consequent loss of image quality; on the other hand, artificial lights and a high light variation modify the radiometry and create reflections both on wet surfaces and shadows, which negatively influence the reconstruction of the point cloud; -the possibility of movement: the size of the paths, the distance from the walls and the possibility of reaching and seeing all the spaces influence the data acquisition operations; -the humidity and temperature: these may affect the functionality of the sensors.
Furthermore, in this case, the obtained models showed that, by following a rapid acquisition process and using a limited number of inputs obtained from a mass-market sensor, it is possible to generate representations at scales large enough for the main geomorphological analyses. The procedure should be further stressed using different types of smartphones, which, in the most recent versions, integrate high-quality sensors and powerful processors.
In the future, the next step of this study will focus on deepening the possible application of smartphone data in underground caves for real-time navigation purposes, where GNSS receivers are not available and the environment could be quite critical or could involve limited accessibility. To this purpose, a possible solution could derive from visual odometry, which would be able to compute in real-time both the motion of the smartphone device and the 3D structure by exploiting the acquired 2D frames.