Documentation of Complex Environments Using 360 ◦ Cameras. The Santa Marta Belltower in Montanaro

: Low-cost and fast surveying approaches are increasingly being deployed in several domains, including in the ﬁeld of built heritage documentation. In parallel with mobile mapping systems, uncrewed aerial systems, and simultaneous location and mapping systems, 360 ◦ cameras and spherical photogrammetry are research topics attracting signiﬁcant interest for this kind of application. Although several instruments and techniques can be considered to be consolidated approaches in the documentation processes, the research presented in this manuscript is focused on a series of tests and analyses using 360 ◦ cameras for the 3D metric documentation of a complex environment, applied to the case study of a XVIII century belltower in Piemonte region (north-west Italy). Both data acquisition and data processing phases were thoroughly investigated and several processing strategies were planned, carried out, and evaluated. Data derived from consolidated 3D mapping approaches were used as a ground reference to validate the results derived from the spherical photogrammetry approach. The outcomes of this research conﬁrmed, under speciﬁc conditions and with a proper setup, the possibility of using 360 ◦ images in a Structure from Motion pipeline to meet the expected accuracies of typical architectural large-scale drawings.


Introduction
The documentation of built heritage and, in general, of cultural heritage, is a complex process that poses a series of issues and that has specific rules and requirements [1][2][3]. Each heritage asset has its own specific features and, depending on its state of conservation and the knowledge process status, the documentation project is characterized by a tailored structure and organization. The documentation of a heritage asset is, or at least should always be, the first phase of the knowledge process, and thus it is crucial to take into account several aspects that contribute and influence its survey. The first aspect to consider relates to the final expected accuracy of the derived metric products and their level of detail, which also depends on the specific goal of the subsequent analyses carried out by different research areas with different expertise [4,5].
Furthermore, time and cost are two other factors that highly influence the design of a documentation project (not only in the built heritage field). Resources and time available both in the field and in the post-processing, analysis, and interpretation phases have a significant impact on the overall design of the heritage documentation [6].
Considering the requirements of the multi-and inter-disciplinary approach that is most often required for a complete knowledge process, the design of the survey output products should be carefully considered and evaluated a priori, with a two-fold goal: (i) to maximize the engagement of all of the experts involved in the knowledge process and (ii) to ensure the required information will be embedded in the survey output products [7,8].
Finally, a crucial step in the documentation process of built heritage is connected with its dissemination, which can be fostered by highly informative survey products [9,10].

1.
The one mainly developed by the Università Politecnica delle Marche under the direction of Prof Gabriele Fangi and commonly defined as multi-image spherical photogrammetry, panoramic spherical photogrammetry, or spherical photogrammetry.

2.
The evolution of the first approach due to the developments of Structure from Motion (SfM) algorithms.
The first approach is well described in the literature [26][27][28][29] and the development of this methodology was mainly related to the idea of exploiting the advantages of spherical images: low-cost, rapid, and complete coverage of the acquired imagery. This method can be described as an analytical approach for the processing of spherical images in an equirectangular projection and its workflow is well described and summarized in [28].
Due to the development of SfM algorithms, 360 • images gained new popularity in recent years and became a research topic that has been well exploited by several authors [30][31][32][33][34][35], who investigated the different issues associated with using this type of image in a SfM-based workflow.
In this manuscript, the latter approach is referred to as spherical photogrammetry.

Case Study
The municipality of Montanaro ( Figure 1) is located 30 km northwest of Turin (Piemonte region, Italy) and hosts the Santa Marta belltower, a valuable built heritage asset designed by Bernardo Antonio Vittone, and built between 1769 and 1772.
Remote Sens. 2021, 13, x FOR PEER REVIEW 3 of 29 [30][31][32][33][34][35], who investigated the different issues associated with using this type of image in a SfM-based workflow. In this manuscript, the latter approach is referred to as spherical photogrammetry.

Case Study
The municipality of Montanaro ( Figure 1) is located 30 km northwest of Turin (Piemonte region, Italy) and hosts the Santa Marta belltower, a valuable built heritage asset designed by Bernardo Antonio Vittone, and built between 1769 and 1772. The main peculiarity of these complex of buildings is that they represent a unique built heritage designed by a single architect, Bernardo Antonio Vittone. These structures are built around the belltower, the fulcrum of the composition, and are the town hall, the brotherhood of S. Marta, and the parish church. The project of Vittone well represents the ideal integration between the secular community and the sacred space in an XVIII century municipality [36]. The belltower is approximately 48 m high; it becomes slender in the progression toward the top and has a peculiar internal spiral stairway made of stone.

Topographic Network and Control Points
Following the consolidated operative practice, the first operation in the field was the creation and measurement of a network of vertices to properly define a reference system supporting the subsequent phases of data acquisition, processing, and metric control. The height of the tower, the proximity of surrounding buildings, and the limited availability of locations that granted good satellite visibility were all elements that influenced the design and setup of the topographic network. Five vertices were materialized in the square in front of the tower and three in intermediate floors (also allowing the measurement of indoor ground control points). Due to the conformation of the urban area near the tower, it was possible to adopt a Global Navigation Satellite System (GNSS) static technique only for two vertices in front of the tower, whereas each of the five vertices were measured with a total station. Nevertheless, to georeference the network to the ETRF2000 UTM Zone The main peculiarity of these complex of buildings is that they represent a unique built heritage designed by a single architect, Bernardo Antonio Vittone. These structures are built around the belltower, the fulcrum of the composition, and are the town hall, the brotherhood of S. Marta, and the parish church. The project of Vittone well represents the ideal integration between the secular community and the sacred space in an XVIII century municipality [36]. The belltower is approximately 48 m high; it becomes slender in the progression toward the top and has a peculiar internal spiral stairway made of stone.

Topographic Network and Control Points
Following the consolidated operative practice, the first operation in the field was the creation and measurement of a network of vertices to properly define a reference system supporting the subsequent phases of data acquisition, processing, and metric control. The height of the tower, the proximity of surrounding buildings, and the limited availability of locations that granted good satellite visibility were all elements that influenced the design and setup of the topographic network. Five vertices were materialized in the square in front of the tower and three in intermediate floors (also allowing the measurement of indoor ground control points). Due to the conformation of the urban area near the tower, it was possible to adopt a Global Navigation Satellite System (GNSS) static technique only for two vertices in front of the tower, whereas each of the five vertices were measured with a total station. Nevertheless, to georeference the network to the ETRF2000 UTM Zone 32N cartographic system, the GNSS data measured in the field were combined with the observations from the network of Continuously Operating Reference Stations (CORS), allowing a more robust and precise computation of the vertices' coordinates. To complete this operation, the data from the CORS of the interregional positioning service SPIN3 of Piemonte, Lombardia, and Valle d'Aosta [37] were used. Specifically, the permanent stations of Torino, Crescentino, and Cuorgnè were employed.
A representation of the topographic network is shown in Figure 2b, together with the position of the CORS used to compute the coordinates of the two vertices measured by the GNSS in the field, Figure 2a.
32N cartographic system, the GNSS data measured in the field were combined with the observations from the network of Continuously Operating Reference Stations (CORS), allowing a more robust and precise computation of the vertices' coordinates. To complete this operation, the data from the CORS of the interregional positioning service SPIN3 of Piemonte, Lombardia, and Valle d'Aosta [37] were used. Specifically, the permanent stations of Torino, Crescentino, and Cuorgnè were employed.
A representation of the topographic network is shown in Figure 2 (b), together with the position of the CORS used to compute the coordinates of the two vertices measured by the GNSS in the field, Figure 2  The second phase of fieldwork was the positioning/selection and measuring of several ground control points to be used for the subsequent data processing and data validation phases. Ground control points were represented both by an artificial paper coded target positioned directly on the wall's surface or by readily identifiable natural features. These were measured on both the exterior and interior part of the belltower to ensure a good connection between indoor and outdoor, and to grant good metric control of the indoor data acquisitions.
A total of 99 ground control points with centimetric accuracy were measured with traditional topographic techniques by means of single side shots from a total station: 67 points outdoor and 32 points indoor (the spatial distribution of ground control points is shown in Figure 3). For the topographic survey, two Geomax Zenith 35 GNSS receivers and a Leica Viva MS total station were used. The second phase of fieldwork was the positioning/selection and measuring of several ground control points to be used for the subsequent data processing and data validation phases. Ground control points were represented both by an artificial paper coded target positioned directly on the wall's surface or by readily identifiable natural features. These were measured on both the exterior and interior part of the belltower to ensure a good connection between indoor and outdoor, and to grant good metric control of the indoor data acquisitions.
A total of 99 ground control points with centimetric accuracy were measured with traditional topographic techniques by means of single side shots from a total station: 67 points outdoor and 32 points indoor (the spatial distribution of ground control points is shown in Figure 3). For the topographic survey, two Geomax Zenith 35 GNSS receivers and a Leica Viva MS total station were used.  The UAS acquisition performed in Montanaro was previously described and analyzed in another published work [38]. For the purposes of this research, these data were

Uncrewed Aerial Systems (UAS)
The UAS acquisition performed in Montanaro was previously described and analyzed in another published work [38]. For the purposes of this research, these data were used to integrate and partially validate the other datasets, and for the sake of completeness and easy reference a short description of their acquisition and processing is reported hereafter. Data derived from the UAS acquisition were crucial to verify the congruence and accurate connection between the indoor and outdoor datasets, particularly on the belltower windows.
A DJI Phantom 4 Pro was used for the acquisition [39] (mechanical shutter camera equipped with a 1" CMOS 20 MP sensor, multi-frequency, and multi-constellation GNSS receiver) and, due to the urban conformation and the type of acquisitions to be completed, the flights were manually executed. For each façade, images were acquired starting from the ground and moving to the top of the tower, and an average object-sensor distance of 5 m was maintained. Both nadiral (optical axis perpendicular with respect to the façade average plane) and oblique (optical axis at 45 • with respect to the façade) images were acquired for each façade. The same acquisition scheme ( Figure 4) was replicated for each of the four edges of the belltower; a total number of 543 images were captured with an expected GSD of 3 mm.

Terrestrial Laser Scanner (TLS)
Acquisitions with TLS were used as a ground reference for the other datasets and were not designed as a complete survey of the belltower, because it would have been highly time consuming due to the conformation of the environment. A Faro Focus 3D X 330 TLS was used (the main specifications are reported in Table 1), and scans were acquired in two different fieldwork phases. The first TLS acquisitions covered the exterior part of the belltower, including several scans of the ground floor, whereas the second acquisition covered the interior of the belltower.
During the first campaign, eleven scans were acquired ( Figure 5a); in the second campaign, seven scans were acquired (Figure 5b).

Terrestrial Laser Scanner (TLS)
Acquisitions with TLS were used as a ground reference for the other datasets and were not designed as a complete survey of the belltower, because it would have been highly time consuming due to the conformation of the environment. A Faro Focus 3D X 330 TLS was used (the main specifications are reported in Table 1), and scans were acquired in two different fieldwork phases. The first TLS acquisitions covered the exterior part of the belltower, including several scans of the ground floor, whereas the second acquisition covered the interior of the belltower.
During the first campaign, eleven scans were acquired ( Figure 5a); in the second campaign, seven scans were acquired (Figure 5b).

Simultaneously Localization and Mapping (SLAM)
During the fieldwork, other range-based acquisitions were performed with a ZEB Revo RT [40,41]. Despite representing a relatively new technology and not being as consolidated as TLS, this system has proven to be suitable for heritage documentation, at least at some representational scales [22,42,43]. Mobile mapping systems (MMS) and, more specifically, those based on SLAM algorithms, have been a popular topic in geomatics research in recent years. Data collected in the field with the ZEB Revo RT were validated in previous research [38] and were fundamental, together with the UAS data, for the delivery of the 2D architectural drawings supporting the restoration projects of the belltower. They were used in this work as a fast-surveying approach comparable to the SP methodology in terms of acquisition time; in this scenario, one of the specific objectives of this research was to evaluate if they are also comparable in terms of accuracy and completeness of information.
Four acquisitions with this system were thus carried out, following closed paths as suggested by the SLAM data acquisition best practices [44]. The limited interior size of the tower and the narrow space available for the operator (limiting the movements) were particularly challenging, not only for the SLAM acquisitions, but also for those using the 360° cameras. The main specifications of the SLAM system are reported in Table 2, and an example of one of the acquisitions is shown in Figure 6.

Simultaneously Localization and Mapping (SLAM)
During the fieldwork, other range-based acquisitions were performed with a ZEB Revo RT [40,41]. Despite representing a relatively new technology and not being as consolidated as TLS, this system has proven to be suitable for heritage documentation, at least at some representational scales [22,42,43]. Mobile mapping systems (MMS) and, more specifically, those based on SLAM algorithms, have been a popular topic in geomatics research in recent years. Data collected in the field with the ZEB Revo RT were validated in previous research [38] and were fundamental, together with the UAS data, for the delivery of the 2D architectural drawings supporting the restoration projects of the belltower. They were used in this work as a fast-surveying approach comparable to the SP methodology in terms of acquisition time; in this scenario, one of the specific objectives of this research was to evaluate if they are also comparable in terms of accuracy and completeness of information.
Four acquisitions with this system were thus carried out, following closed paths as suggested by the SLAM data acquisition best practices [44]. The limited interior size of the tower and the narrow space available for the operator (limiting the movements) were particularly challenging, not only for the SLAM acquisitions, but also for those using the 360 • cameras. The main specifications of the SLAM system are reported in Table 2, and an example of one of the acquisitions is shown in Figure 6. (a) (b) Figure 6. Example Zeb REVO RT acquisition. The raw point cloud (a) and the paths followed during the survey operations (b).

360° CAMERAS
Two different 360° cameras were used for acquiring images of the Montanaro belltower: the GoPro Fusion and the Kandao Qoocam 8k ( Figure 7); the main specifications of the two devices are reported in Table 3.    Table 3.

360° CAMERAS
Two different 360° cameras were used for acquiring images of the Mont belltower: the GoPro Fusion and the Kandao Qoocam 8k ( Figure 7); the main spec tions of the two devices are reported in Table 3.    Data acquisition with a 360 • camera is generally easier compared to traditional CRP (close range photogrammetry) approaches based on a frame camera; nevertheless, it is also important to carefully design the acquisition strategy for this kind of sensors As reported in [45], three main strategies can be adopted to acquire 360 • images for SP processing: (i) still images, (ii) time lapse, and (iii) video. In the Montanaro experience it was decided to adopt the third strategy based on the acquisition of videos. The video strategy was chosen because it was the fastest approach and the acquisition strategy with the best timecost balance. For each of the two cameras, videos were recorded at the highest available resolutions: 5.2 K (30 fps) for the GoPro Fusion and 8 K (30 fps) for the Kandao Qoocam 8 K. The camera was mounted on a single pole support and held at around 40-50 cm above the operator's head; this configuration allowed the presence of the operator in the FoV (Field of View) of the cameras to be reduced. It was also important that the operator continuously controlled the overall environment with respect to the camera position during the acquisition to avoid collisions with other elements; this was particularly critical for the Montanaro indoor acquisitions. The interior conformation of the tower and the narrow spaces of the spiral stairway thus posed a series of issues during the acquisitions, especially in the upper part of the building. In this portion of the structure, which hosts the mechanism of the four clocks of the belltower and connects the lower part with the bell chamber, the space for the operator to move is reduced and only a small trapdoor connects the two sections of the tower. Thus, the distance between the camera and walls was reduced, and moving across the trapdoor also required the operator to reduce his distance with respect to the camera. Moreover, this area is characterized by poor lighting due to the lack of windows. All of these aspects had a strong impact in the acquisition and processing phases, as discussed further below.
The two videos were acquired following a roundtrip walk starting from the square in front of the belltower, entering the tower, walking the helicoidal stairway up to the bell chamber, and then returning via the same path ( Figure 8). Each video has a duration of around 7 min and a file size (after the stitching phase) of 30 GB for the GoPro and 13 GB for the Kandao.
The difference in terms of file sizes between the videos is due to the different video formats used by the two video stitching applications. Video of the Qoocam 8 K were saved using a compressed ".mp4" format and thus, despite having a higher resolution, they were smaller than the Fusion videos that were saved in a ".mov" format. chamber, and then returning via the same path ( Figure 8). Each video has a duration of around 7 min and a file size (after the stitching phase) of 30 GB for the GoPro and 13 GB for the Kandao.
The difference in terms of file sizes between the videos is due to the different video formats used by the two video stitching applications. Video of the Qoocam 8 K were saved using a compressed ".mp4" format and thus, despite having a higher resolution, they were smaller than the Fusion videos that were saved in a ".mov" format.

Data Processing
The information reported in the following paragraphs is mainly dedicated to the processing of the spherical images adopting SfM approaches, and represents the central topic of this manuscript. However, for the sake of completeness, some notes on the processing steps followed for the other datasets involved in the validation of the SP approach are also reported.

UAS
Data acquired by means of UAS were processed following the standard SfM pipeline using the commercial solution Agisoft Metashape (v. 1.7.2). The processing was previously described in [38] and the main results are reported in Table 4. The design of the acquisition, which followed the best practices for this kind of survey, led to high-resolution 3D models ( Figure 9). The data derived from the UAS survey were used, as previously reported, for the construction of 2D architectural drawings (plans, facades, and sections) useful for documenting all the belltower. Furthermore, they were exploited to evaluate the metric and geometric quality of the other datasets. The design of the acquisition, which followed the best practices for this kind of survey, led to high-resolution 3D models (Figure 9). The data derived from the UAS survey were used, as previously reported, for the construction of 2D architectural drawings (plans, facades, and sections) useful for documenting all the belltower. Furthermore, they were exploited to evaluate the metric and geometric quality of the other datasets.

TLS
Regarding the TLS dataset, the Faro Scene software was used for the data processing of the scans. For the purposes of this research, only the indoor scans were considered, whereas, as previously reported, the UAS dataset was used as a ground reference for the exterior part of the tower. For six of the seven indoor scans, the processing was divided in a two-step approach: the scans were preliminary co-registered by performing a rough manual registration optimized using an Iterative Closest Point algorithm (ICP). In the second step, the registered scans were georeferenced using the available ground control points. By comparison, the scan acquired in the bell chamber did not have a sufficient overlap with the other scans and therefore it was processed with a single step approach using the ground control points for georeferencing the scans. The quality of the registration of the indoor dataset is reported in Table 5.

TLS
Regarding the TLS dataset, the Faro Scene software was used for the data processing of the scans. For the purposes of this research, only the indoor scans were considered, whereas, as previously reported, the UAS dataset was used as a ground reference for the exterior part of the tower. For six of the seven indoor scans, the processing was divided in a two-step approach: the scans were preliminary co-registered by performing a rough manual registration optimized using an Iterative Closest Point algorithm (ICP). In the second step, the registered scans were georeferenced using the available ground control points. By comparison, the scan acquired in the bell chamber did not have a sufficient overlap with the other scans and therefore it was processed with a single step approach using the ground control points for georeferencing the scans. The quality of the registration of the indoor dataset is reported in Table 5.

SLAM
The first step of the processing dedicated to the data acquired with the Geoslam ZEB Revo RT consisted of an optimization of the four point clouds, before the second phase aimed at georeferencing the data in the common reference system. The data processing phase was carried out using the dedicated software solution (Geoslam Hub), following the standard workflow with a limited intervention from the operator.
The four scans were first roughly manually aligned and then processed with the merge function of the Geoslam software. Using this option, it is possible to align all the scans in the same local reference system, and a second optimization of the scans is also performed. This function allows correction of gross or drift errors of the raw scans that are not always visible by means of a qualitative evaluation of the data. Acquiring a redundant number of scans with a high degree of overlap, particularly in complex environments such as that of the belltower, represents a good strategy to fully exploit the merge function, optimize each point cloud, and obtain a complete model of the surveyed object.
The final phase of the SLAM processing is represented by the georeferencing of the data. This phase was completed using the LiDAR dataset as a ground reference. After a first coarse registration of the two datasets, an ICP registration was performed using the CloudCompare software and adopting the LiDAR data as the blocked reference. The RMSe (root mean square error) achieved after this operation presents a mean value of 0.04 m, in line with the expected precision of the instrument [22].

Spherical Cameras
Before moving to the photogrammetric processing of 360 • images, it is important to analyze the raw data of the two cameras. The GoPro Fusion is equipped with two micro sd cards, one for each sensor, where two different videos for each acquisition are stored independently. The Kandao Qoocam 8K has an internal memory and can be equipped only with a single sd card: a single video is recorded for each acquisition embedding two different tracks, one for each sensor. To work with the single video acquired by the Qoocam, it is thus necessary to undertake further preprocessing to split the two video tracks into two separated files. This task was completed with a command line script of the opensource solution FFmpeg [46].

The Stitching Phase
The first choice when dealing with the photogrammetric processing of data collected with a spherical camera is whether to proceed with the stitching phase.
Image, or video, stitching is the technique that allows combining multiple singular images or video frames into a mosaicked virtual image or video. The main revolution in the processing of image stitching is related to the work of Szeliski and Shum [47] at the end of the 1990s, which was further developed in the following years [48][49][50][51]. The stitching phase can be automatically performed by adopting several software solutions, both commercial and open-source. However, in recent years, commercially available 360 • cameras are provided with their own dedicated software solution for the stitching phase. This is also the case of GoPro Fusion and Kandao Qoocam 8K, which are provided with, respectively, GoPro Fusion Studio and Qoocam Studio. It should be noted that, in general, these dedicated software solutions are limited in terms of customization, and that few options are available during the stitching. This reduces the possibility to cope with the most common issues during the stitching, i.e., the parallax effect, different exposure times among images, and the ghosting effect.
For the purposes of this research, it was decided to test both approaches, i.e., working with the data derived from the single 360 • sensors and with the stitched 360 • videos (processed in their own dedicated software solution at the maximum available resolution).

Frames Extraction and Processing Strategies
To further proceed with the processing phase, before adopting the standard SfM approach, it is necessary to extract single frames from the videos (both spherical and not stitched). This phase clearly has an impact on the data processing [52][53][54], especially in terms of number of images and overlapping. The interval adopted for frame extraction, i.e., the number of frames to be skipped, is linked to several aspects: original video framerate/quality, operator moving speed during the acquisition, illumination, desired overlap between images, scene conformation, etc.
The videos recorded with both the GoPro and the Kandao had a framerate of 30 fps and, for the purposes of this research, three different frame extraction intervals were tested and analyzed, according to previous experience [45,55,56] and taking the characteristics of the Montanaro belltower into account: FPS = 0.5 (skipping 14 frames), FPS = 1 (skipping 29 frames), and FPS = 2 (skipping 59 frames).
Finally, combining the frame step extraction with the stitching phase, five different processing strategies were adopted and tested for each of the two 360 • cameras tested in this research: three using spherical images after the stitching phase and two using the data derived from the single sensors. An example of the extracted frames is showed in Figure 10.

Results
The data derived from the different strategies adopted for the processing of the two spherical datasets was carefully analyzed and validated under different perspectives, and detailed considerations are reported in the following sections. First, both a qualitative analysis of the completeness in the reconstruction of the overall belltower and a quantitative analysis on the metrical accuracy based on the overall RMSe on GCPS and CPs were carried out.
A second group of analyses is focused on the comparison between distances derived by total station measurements and the same distances extracted from the different 3D models generated by the five processing strategies.
Finally, the accuracy of the geometrical reconstruction provided by the different 360 approaches was also evaluated and validated by means of the extraction of planar and cross sections from the different models and their comparison with the data provided by the ground reference (UAS, TLS, and SLAM). The different datasets used as ground reference are characterized by similar positional accuracies (a few centimeters), as detailed During the development of the different processing strategies, it was decided to adopt a self-calibration approach for the estimation of the interior orientation (I.O.) parameters of the adopted cameras. This choice was based on the experience gained in previous research works [55,56] and also because it was consistent with the aim of the fast-surveying approach of this research. Research related to the issues connected with the I.O. phase of this kind of sensor is under development and will likely lead to an enhancement of the achieved results.
In more detail, the five approaches for each of the two cameras (GP stands for GoPro and QC stands for Qoocam) have the following main characteristics:

Results
The data derived from the different strategies adopted for the processing of the two spherical datasets was carefully analyzed and validated under different perspectives, and detailed considerations are reported in the following sections. First, both a qualitative analysis of the completeness in the reconstruction of the overall belltower and a quantitative analysis on the metrical accuracy based on the overall RMSe on GCPS and CPs were carried out.
A second group of analyses is focused on the comparison between distances derived by total station measurements and the same distances extracted from the different 3D models generated by the five processing strategies.
Finally, the accuracy of the geometrical reconstruction provided by the different 360 approaches was also evaluated and validated by means of the extraction of planar and cross sections from the different models and their comparison with the data provided by the ground reference (UAS, TLS, and SLAM). The different datasets used as ground reference are characterized by similar positional accuracies (a few centimeters), as detailed in the processing results of each technique: they can therefore be exploited for validation purposes.

Completeness, Metric Quality, and Geometric Reconstruction of the SP Approach
A first qualitative analysis was carried out on the completeness of the reconstruction provided by the different approaches. The completeness of the reconstruction of the belltower is mainly related to the phases of image matching, tie point (TP) extraction, and camera position estimation. The most critical point in this phase is represented by the area that connects the bell chamber with the remainder of the structure by means of a trapdoor, which is a narrow area with low light conditions, and therefore critical for the extraction of TPs and image correlation. Figure 11 shows a graphical representation of the different level of completeness provided by the different tested approaches, and Table 6 shows that the levels of completeness are also linked with the number of points generated in the densification phase of the SfM processing. in the processing results of each technique: they can therefore be exploited for validation purposes

Completeness, Metric Quality, and Geometric Reconstruction of the SP Approach
A first qualitative analysis was carried out on the completeness of the reconstruction provided by the different approaches. The completeness of the reconstruction of the belltower is mainly related to the phases of image matching, tie point (TP) extraction, and camera position estimation. The most critical point in this phase is represented by the area that connects the bell chamber with the remainder of the structure by means of a trapdoor, which is a narrow area with low light conditions, and therefore critical for the extraction of TPs and image correlation. Figure 11 shows a graphical representation of the different level of completeness provided by the different tested approaches, and Table 6 shows that the levels of completeness are also linked with the number of points generated in the densification phase of the SfM processing.   Figure 11. Qualitative analysis of the completeness of the point cloud derived from the different approaches. It should be noted that seven of the ten datasets were able to provide a complete reconstruction of the belltower. For the GoPro, the two datasets that failed in the reconstruction were the GP3 (0.5 fps, stitched, round trip) and the GP4 (1 fps, single camera, round trip). For both of these datasets, the main issue was probably related to an insufficient number of images in relation to the chosen approach. GP3 used stitched images; however, extracting one frame every 60 frames is probably not enough to ensure a reasonable overlap between the images in the upper part of the tower. By comparison, GP4 uses single images extracted from each of the two sensors embedded in the camera; in this case, the number of frames extracted is also probably not sufficient to derive a correct correlation between images in the upper part of the tower. For the Qoocam, the situation was similar; however only the single camera dataset (QC4) failed in delivering a complete reconstruction of the tower. The fact that the 360 • dataset (QC3) in this case performed better than that of the GoPro is probably simply related to the higher resolution of the camera itself.
Other interesting observations can be made concerning the number of 3D points generated during the densification phase in the different processing approaches (Table 6). For the 360 • strategy, the number of points generated adopting a round trip or stitched approach in the case of 1 frame every 30 frames is almost the same for both cameras. The higher resolution of data collected from the Qoocam is also clearly visible in the higher number of points generated for each approach.
The metric validation of the proposed approaches was carried out in different steps: on the RMSe achieved for both GCPs and CPs after the processing of the different strategies; considering some 3D distances between ground control points measured in the field and the same distances extracted from the 3D models generated adopting the different approaches; and, finally, in the 2D sections extracted from the different point clouds. The RMSe on both GCPs and CPs for the different approaches is reported in Table 7. The different number of GCPs and CPs used in the different projects is related to the completeness of the SfM process in the reconstruction of the belltower; some of the points used as ground control points are located in the area that was not reconstructed by some of the tested approaches. In general terms, the Qoocam performed better than the GoPro, with an RMSe value that is lower for each of the different approaches, and particularly for the 0.5 and 1 fps. As for the image matching phase, this is related to the higher resolution of the Kandao camera, which produces images having, in general, higher quality in terms of sharpness and overall quality.
For both the cameras, the single-sensor approaches (4 and 5) were the most successful if considering the overall RMSe on GCPs and CPs. Almost all the approaches meet the requirements of a 1:200 nominal map scale and the errors are comparable to those achieved during the processing of the ZEB REVO RT dataset.
However, considering only the RMSe as a parameter to evaluate the overall metric accuracy of the processing may be misleading and further analyses were performed.
The second analysis compared 3D distances derived from the 3D coordinates of the ground control points and the same distances derived from the point clouds achieved by the different photogrammetric processing approaches, to focus on the relative precision rather than on the absolute accuracy.
A total of four distances were considered and their position is shown in Figure 12, and the values are reported in Table 8. Table 8. Computed values of the 4 distances for the different processing approaches compared to the related measurements in the field. The difference between the value measured in the field and the one extracted from the photogrammetric processing is reported in brackets. Missing values are due to the incomplete reconstruction for some approaches.  Figure 12. Graphical representation of the four 3D distances considered to evaluate the accuracy of the photogrammetric processing. Table 8. Computed values of the 4 distances for the different processing approaches compared to the related measurements in the field. The difference between the value measured in the field and the one extracted from the photogrammetric processing is reported in brackets. Missing values are due to the incomplete reconstruction for some approaches.  The distance differences are in the range of a few centimeters, in line with the results achieved for the RMSe.

D1 (m)
However, it should be considered that due to the conformation of the belltower and the organization of the survey, ground control points are not present in every section of the buildings and not homogenously distributed. This configuration can lead to some errors in the overall evaluation of the accuracy of the different processing; thus, another analysis was completed through the extraction of cross and planar sections. The sections were extracted using the PointCab software by generating a thin section from the different point clouds derived from the 360 • dataset. Moreover, three other point clouds were used as a ground reference: the LiDAR, the UAS, and the SLAM point clouds.
Each of the sections were generated by adopting the same set of parameters and were then imported into AutoCAD to be transformed into 2D polylines. The different polylines were then compared to assess any possible discrepancies that were not evident from the previous analyses. Furthermore, it was also possible to evaluate the ability of the different datasets to reconstruct the geometry of the belltower in comparison with the more consolidated techniques. Three horizontal sections (Appendix A- Figures A1-A3) were extracted at different heights, covering both areas where GCPs were measured in the field and where they were not present. Finally, a vertical cross section of the whole belltower was extracted (Appendix A- Figures A4 and A5). An example of a horizontal section is shown in Figure 13. point clouds derived from the 360° dataset. Moreover, three other point clouds we as a ground reference: the LiDAR, the UAS, and the SLAM point clouds.
Each of the sections were generated by adopting the same set of parameters an then imported into AutoCAD to be transformed into 2D polylines. The different po were then compared to assess any possible discrepancies that were not evident fr previous analyses. Furthermore, it was also possible to evaluate the ability of the d datasets to reconstruct the geometry of the belltower in comparison with the mo solidated techniques. Three horizontal sections (Appendix A - Figure A1, A2, A3 extracted at different heights, covering both areas where GCPs were measured in t and where they were not present. Finally, a vertical cross section of the whole be was extracted (Appendix A- Figure A4 and Figure A5). An example of a horizon tion is shown in Figure 13. These sections were crucial to underline some issues in the overall photogram processing of the 360° images that were not evident when analyzing other para such as the RMSe on ground control points or 3D distances. It is clear that some misalignments or drift errors can occur in the areas without ground control poin is especially visible in the processing approaches 1, 2, and 3, which were based on of 360° stitched images and, also in this case, with a better performance of the Q thanks to its higher resolution.
In general, approaches 4 and 5, which were based on the use of the single derived from each sensor, present a lower deviation compared with that of the 3 proach. These sections were crucial to underline some issues in the overall photogrammetric processing of the 360 • images that were not evident when analyzing other parameters, such as the RMSe on ground control points or 3D distances. It is clear that some major misalignments or drift errors can occur in the areas without ground control points. This is especially visible in the processing approaches 1, 2, and 3, which were based on the use of 360 • stitched images and, also in this case, with a better performance of the Qoocam thanks to its higher resolution.
In general, approaches 4 and 5, which were based on the use of the single images derived from each sensor, present a lower deviation compared with that of the 360 • approach.
It can be observed that each of the five approaches leads to a good performance (in terms of geometrical reconstruction) where ground control points are present to assist the SfM processing.

Processing Time
Despite the availability of powerful desktop computer and the enhancements of SfM algorithms and software, processing time remains a crucial issue in the overall photogrammetric pipeline. A comparison between the processing time of the different strategies tested is reported in Table 9. In general, processing the Qoocam 8K dataset requires more time due to the higher resolution of the images extracted from the videos.
By comparison, the behavior of the five different strategies has the same trend both for the Fusion and the Qoocam. Concerning the three 360 • approaches (1, 2, and 3), it is possible to reduce the number of images, and thus the processing time, by adopting two different solutions: using only the one-way acquisition (2) or doubling the step for frame extraction (3). The two strategies that use the data of the single cameras (4 and 5) require more time, in general, compared with the 360 • approaches. Nevertheless, these considerations were relatively predictable and need to be related to the other results reported in this section for a comprehensive assessment.

Discussion and Conclusions
The research presented in this manuscript focused on evaluating the possibility of using the data acquired by two different 360 • cameras to document a complex heritage asset, the Santa Marta belltower. Starting from the experience gained in previous research, different implications connected with the development of an SfM pipeline were considered and analyzed from different perspectives, including the acquisition phase, the processing phase, and the generation and use of different added-value products. The same acquisition scheme was followed for the two 360 • cameras tested in this research (GoPro Fusion and Kandao Qoocam), and the different characteristics and performances of the two systems were carefully evaluated and reviewed. Five different processing strategies were set up and tested for each 360 • camera, based on stitched images or raw data, in which video frames with different time intervals were extracted, and both one-way and roundtrip acquisition paths were adopted.
The metric accuracy of the proposed approaches was evaluated considering different features: the RMSe on GCPs and CPs, the comparison of 3D distances, and finally the extraction of several planar and cross sections by means of 2D polylines. Although analyses of RMSe and 3D distances reported good results, the extraction of planar and cross sections was crucial to identify biases in the 3D models that were generated by the different strategies. These biases were not visible in the other analyses and thus the generation of 2D sections was fundamental to assess the performances of the different approaches.
In general terms, approaches based on single cameras (4 and 5) were the best in terms of 3D metric accuracy and the level of detail of the 3D model. Nevertheless, they suffered in terms of point cloud completeness and were the most demanding in terms of processing time.
On the contrary, approaches based on stitched images (1, 2, and 3), in general, performed slightly worse in terms of the 3D positional accuracy, but slightly better in terms of completeness of the model, and clearly better concerning the processing time.
If only the RMSe values on GCPs and CPs are considered (Table 7), together with the deviation between the 3D distances calculated from the total station measurements and those extracted from the models generated with the different processing approaches (Table 8), it is possible to state that the data derived following each of the five approaches for both the cameras met the accuracy requirements of a nominal map scale in the range from 1:200 to 1:300.
Nevertheless, also considering the information derived from the analyses of 2D sections extracted from the different 3D models (Appendix A), the impact of the availability and spatial distribution of GCPs on the accuracy consistency of the 3D models is evident.
In the area covered by GCPs, the deviation between the reference models (UAS, LiDAR, and SLAM) was in the order of few centimeters, whereas they were not present in the order of dozens of centimeters. This issue is particularly evident in the z-axis of the reference system (which corresponds to the development in the height of the tower and the direction followed during the acquisition), and is probably related to the complexity of the surveyed environment (narrow spaces with low lighting), which affects the phases of image matching, camera position estimation, and tie point extraction.
This assessment regarding the fulfilment of the accuracies for a nominal map scale of 1:200/1:300 is thus true only if GCPs are available and if their spatial distribution covers the whole surveyed area, at least in case of complex assets such as that of the Santa Marta belltower. Moreover, it must be stressed that local discrepancies are not easily detected using only standard positional accuracy metrics (such as RMSe on GCPs and CPs).
Therefore, it is possible to state that data acquired from 360 • cameras (both stitched and single images) can be successfully used as a fast-surveying technique to derive 3D models and added-value products fitting the classical architectural representational scales.
These data can not only be used to derive traditional products such as 2D architectural drawings, but have also an intrinsic added value as 360 • data: their immersive component can be used to derive virtual tours or to support the operator in the interpretation phase after the data processing. They can thus be used in the phases of 2D drawing or as a support in the generation of HBIM models, enriching the parametric information database.
Finally, a number of issues remain under investigation and will be further examined in the near future, such as the possibility of combining the 360 • images/videos with the generated 3D models in a virtual environment to better manage and share the data collected in the field. The data processing phases can be further extended and refined, by testing different GCPs configurations (including different test sites with different characteristics), and focusing on the estimation of I.O. parameters both for the 360 • images and the single images. This issue was partially considered in this research, by adopting strategies derived from past experiences, particularly for the single camera processing [56], and requires further research, as recently demonstrated in [57].
Further tests are currently ongoing at the same test site to assess the suitability of new low-cost devices for fast surveying purposes, i.e., the lidar sensor available on the Apple iPad Pro and the Apple iPhone Pro 12. The sensors mentioned above and the related applications exploit both SLAM and photogrammetric algorithms to derive 3D point clouds. They can therefore be synergistically integrated with 360 • cameras to enhance the positional accuracy and the completeness of the 3D models, especially in complex environments.