1. Introduction
Noncontact measurement methods are widely used in architectural heritage surveys. According to the employed sensors, these methods can be distinguished as range-based modeling via active optical sensors and image-based modeling via passive optical sensors [
1]. Presently, one of the most commonly used methods of range-based modeling in architectural heritage is terrestrial laser scanning (TLS). It is capable of measuring 3D geometric data with mm-level accuracy and has been applied to digital documentation of cultural heritage since the 1990s [
2]. Image-based modeling developed rapidly in the past decade motivated by structure from motion (SfM) algorithms and imaging sensors. SfM automates the process from tie points extraction to dense points generation without camera pre-calibration [
3,
4]. It speeds up conventional photogrammetry and facilitates non-expert use. Compact cameras along with external platforms expand the coverage of image acquisition to wider areas such as low-altitude domains [
5], historic urban infrastructure [
6], and underwater [
7]. The progresses in data acquisition promote novel methods of data elaborations from conventional computer-aided drafting (CAD) to building information modeling (BIM) [
8] and 3D geographic information system (GIS) [
9].
However, in narrow scenes of architectural heritage, data acquisition is not a straightforward task, considering a reasonable budget, portability, and efficiency [
10]. Although novel laser scanners (e.g., Leica BLK360) and mobile mapping systems (MMSs) based on simultaneous localization and mapping (SLAM) technology are portable [
11,
12], they are poor in chromatic data retrieval compared to photogrammetry [
13], and limited to high-budget surveys. Photogrammetry is a low-cost and rapid measurement method, but photographing with regular lenses in narrow scenes leads to short baselines (the length between neighboring image pairs), while a large baseline/depth (B/D) ratio is crucial to photogrammetric accuracy [
14]. In narrow spaces, the field of view (FOV) of each photograph is limited; hence a large image amount is needed for sufficient overlap. It burdens not only on-site labor intensity but also computation resources in-lab. Compared to classical cameras with regular lenses, spherical cameras offering 360°-FOV can reduce the image amount and avoid excessively short baselines, but they also lead to serious image distortions and nonuniform ground sampling distance (GSD) [
15]. Regular lenses follow perspective projections, while fisheye lenses that constitute a spherical camera are designed with different types of projections (e.g., equidistant, equisolid, stereographic, and orthographic), among which equidistance is perhaps the most common [
16]:
where
θ is the angle between the principal axis and the incoming ray,
r is the radial distance from the principal point, and
f is the focal length.
Studies have been devoted to the calibration of spherical cameras [
17]. However, although unit commercial software enables the use of spherical cameras with “one-click operation”, they are not an option for nonexpert users in measurement. In photogrammetry software such as PhotoScan [
18], Pix4Dmapper [
19], and ContextCapture [
20], equirectangular images pre-stitched from multi-lenses are currently supported. Along with low-cost spherical cameras (less than 300 euro), such software packages introduce a novel 3D reconstruction approach for narrow scenes of architectural heritage with high portability, low cost, moderate level of accuracy, and decent chromatic data retrieval [
21,
22].
When the scales of 3D reconstruction are large, capturing video streams is much more efficient than capturing still images [
23]. In narrow scenes of architectural heritage, videographing with a spherical camera while walking not only speeds up on-site work, but also ensures sufficient overlaps between neighboring frames and consequently robust transitions between different scenes (i.e., from exteriors to interiors, from a room to another) [
24]. These advantages are helpful to operators without high proficiency and knowledge of estimating image overlap and can be used for rapid measurements in tourist sites. In spite of such conveniences, videogrammetry is not frequently used for architectural heritage surveys focusing on accuracy. In contrast to static images, video frames may suffer from low resolution, blur effects, and redundant overlaps [
25]. At present, many low-cost spherical cameras are capable of 3.5 k (ca. 8 million pixel) videography resolution. Larger resolution is expected in the near future with current sensor-upgrade speed. Though such resolution covers 360° FOV, they lead to sufficient GSD in narrow scenes where distances between targets and cameras are usually only a few meters. Hence, we consider here the other two issues—baselines and blur assessment—the main factors impacting the accuracy of spherical-camera videogrammetry.
Frames should be selected before 3D reconstruction, as a full employment is usually unfavorable to accuracy and excessive for computation. Taking 30 frames per second (fps) of the camera and 1 m/s walking speed, the average distances between adjacent frames (baselines) are only 3.3 cm. Even for narrow scenes with limited distances, such baselines are too short to form large B/D ratios. Generally, a shorter B/D ratio gives rise to more tie points, but it also increases reconstruction uncertainties and yields more noise in 3D models. In photogrammetry, a physical point appearing in three images is supposed to be the ideal overlap ratio and the guidance determining optimal camera networks [
26]. Using similar principles, several methods select key video frames for 3D reconstruction by achieving a balance between a sufficient overlap and long enough baselines [
27]. Improved methods take more factors into consideration, such as reprojection errors, complexity of the scene, and speed of camera movements [
28]. In spite of the excellent results these methods might achieve, they do not address the case of spherical cameras. Due to their 360° FOV, overlaps can easily be satisfied from a minimal number of frames with long baselines. In addition, the impact of frame selection on the 3D points accuracy and density of semantic components (floors, walls, reliefs, etc.) given their distinctive geometries, surface contexture, and lighting conditions are rarely addressed, as there was no need to do so from the perspective of computer vision. The work aims at evaluating spherical-camera videogrammetry as a straightforward method for a nonexpert user. Consequently, the frames are simply extracted with different ratios for studying the impact of baselines and integrated with blur assessment of video frames, as discussed in the next paragraph.
At present, many low-cost panoramic cameras have in-built electronic image stabilization. Fundamentally, image stabilization actively removes blur by predicting the future motion from the past one [
29]. In addition, external gimbal stabilizers can be used to further reduce camera shake. In addition to hardware methods, several methods have been proposed to handle the problem by modeling motion blur and deblurring them [
30], but several minutes are usually required to deblur even a single frame. Given the dense baselines and the fact that only a small proportion of frames are used in our applications, a wise method is simply detecting the most blurred frames and substituting them with sharper adjacent frames. This avoids potential errors in deblurring and intensive computations. One of the most commonly used methods for blur assessment is the Blur Metric [
31]. This method evaluates blur effects in the horizontal and vertical directions and reports the results of blur perception with numbers ranging from “0” to “1” (with “0” being the best and “1” the worst). Another convenient blur assessment method is a built-in function in photogrammetry software PhotoScan, named "Estimate Image Quality". It evaluates the sharpness of frames with "black-box" algorithms also by means of the numerical range from zero to higher values (with zero being the worst). These two methods are employed in this study to assess the impact of blur filters on 3D reconstruction accuracy.
In spite of the potential of spherical-camera videogrammetry as a low cost, rapid, and robust measurement method in narrow scenes of architectural heritage, as shown in
Figure 1, its application to metric purposes with variable frames extraction ratios and blur filters requires careful study. This work addresses these issues based on two observational studies varying in layouts, surface textures, and lighting conditions. The methodology is simple but effective: first, 3D results are generated from different variables; then, their accuracy is assessed by comparing them with the pre-measured ground truth models (GTMs). The study objective is to provide practical references to nonexpert users videographing with spherical cameras for metric purposes. The relative accuracy, impact of baselines, and impact of blur filters of the tested method, along with the existing 3D reconstruction workflow, could be useful to measure narrow architectural heritage owing to low cost, high portability, and easy operation.