2.2. Calibration Procedures
Camera calibration in photogrammetry refers to the process of determining the geometric behaviour of the optical system of a camera, which is essential for accurately back-projecting 3D spatial information from images. Camera calibration can be performed either in a laboratory setting or, as is often the case with off-the-shelf and consumer-grade cameras, through analytical procedures. In the latter case, the process typically involves capturing images of a calibration target (which can be the object to survey itself, in which case the calibration procedure is often referred to as on-the-job [
24]) with known geometric properties (e.g., providing a set of GCPs) from various orientations and distances. Then, a specific parametric camera model (e.g., the Brown–Conrady model [
20]) is considered to analytically approximate the behaviour of the projective system. The parameters of such a model, referred to as intrinsic parameters, are estimated with some optimization techniques (generally with a Bundle Block Adjustment (BBA) procedure [
25]). Four different calibration strategies are considered in this work: (i) full-field calibration (FF); (ii) multi-image on-the-job calibration (MI); (iii) point cloud-based calibration (PC); and (iv) self (on-the-job) calibration (SC). In the following sections, each single strategy is briefly described.
2.2.1. Full-Field Calibration—FF
Full- (or test-) field calibration (in the following referred to as FF) consists of analytically estimating the intrinsic parameters of a camera through a comprehensive calibration process that involves using optimal (i.e., providing the best stability and accuracy of the parameters) image block and object geometry. In other words, a specific calibration framework is set up to provide the best conditions for the estimation procedure of the intrinsic parameters. The results (i.e., the estimated parameters) are then used in subsequent surveying activities where the same camera is operated (but where the same optimal estimation conditions cannot be guaranteed), assuming the intrinsic parameters do not change over time. To ensure successful FF, several requirements should be met: the camera must remain consistent throughout the shoot, avoiding changes in zoom or focus, which also means that the object should be shot from a similar distance to the one expected in the subsequent surveys. The camera positions and optical axis directions should encompass a broad range of angles (convergent pose geometry) and object points should be visible on multiple photos and from various angles and/or at various depths from the camera. Object points should be easily and accurately identified, i.e., the use of coded targets is strongly advised. The object should fill most of the image frame and some camera positions should involve rotation, utilizing both landscape and portrait orientations, to prevent intrinsic and extrinsic parameter correlation in the estimated analytical model.
The primary challenge with FF procedures arises when dealing with significant camera-to-object distances, as ensuring appropriate object and image block geometry can become problematic. The object must be sufficiently large to occupy most of the image frame. However, moving the camera around the object to capture multiple perspectives can become difficult.
The anticipated camera-to-object distances ranges from 50–100 m. At these distances, it is impractical to establish a calibration field that can be accurately framed for visibility whilst having targets in all areas of the camera sensor. Hence, for this experiment, an ad hoc target field of appropriate size was set up, assuming that for distances greater than a few dozen meters, the focus (and consequently the optical characteristics) of the camera does not change substantially. The camera is housed in a protective box; hence, the optical path of the light rays is influenced not only by the camera’s optics but also by the protective glass panel placed in front of the camera. For this reason, calibration must take place with the camera mounted in a fixed position inside the protective box, which creates further complications by requiring the use of a remote shutter button for image acquisition. The FF calibration of each camera of the fixed monitoring system was performed at the University of Newcastle (NSW, Australia). A north-facing brick wall was equipped (
Figure 2) with 70 coded targets placed on approximately seven different heights (the approximate distance between adjacent targets was 65 cm horizontally and 75 cm vertically), encompassing a total area of about 6 × 4.5 m
2.
The targets were surveyed from two different positions using a Leica TS11 Total Station in a local reference system (expected ground coordinate accuracy of 3 mm). Images were acquired from 15 different positions (
Figure 3), at an average distance from the object of approximately 10 m, by varying the acquisition height from the ground using three different forklift extensions within a range of 4 m and rotating the camera in two landscape and two portrait orientations. Consequently, 60 images were acquired in total for each single FF calibration.
All the analytical calibrations were performed using Agisoft Metashape v. 1.8.2 [
26]. The proposed calibration procedure represents the most reliable and accurate calibration methodology that can be employed. It should also provide the best results in terms of 3D reconstruction by the monitoring system, provided that the optical parameters of the cameras remain unchanged after the FF calibration. However, it should be noted that this is also the most complex and costly methodology to implement, requiring a calibration field and a series of specialized equipment and facilities to support the operations. If calibration needs to be repeated at regular intervals (which is strongly advisable for long-term monitoring processes), this requires dismantling and transporting the monitoring system to the calibration facility and then reinstalling it on-site.
2.2.2. Multi-Image on-the-Job Calibration—MI
As previously mentioned, the main issue about performing an on-the-job calibration directly on-site using images acquired by the monitoring system originates from the possible unreliable estimation of the parameters due to the intrinsic geometric weakness of the photogrammetric block. This is particularly problematic when the image block consists of just a few camera stations (e.g., just two). Additionally, it is worth noting that each camera has its own calibration parameters, which increases the degree of freedom of the estimation system and consequently its numerical weakness. Moreover, in many cases, the captured object (e.g., a rock wall) is predominantly flat and does not offer depth variations, which is useful for decoupling calibration parameters.
To address this issue, a calibration strategy proposed here involves capturing a series of images during the system installation phase aiming at developing a geometrically more robust photogrammetric block. This occurs before fixing the cameras and their protective box in the intended monitoring position. In many instances, due to safety reasons, it may not be possible to acquire the images at different heights (e.g., using a forklift as in
Section 2.2.1), but it is still possible to create a horizontal strip with convergent shots (i.e., photographs from different angles), even by orienting the camera in landscape or portrait mode and using different distances from the object. The placement of coded targets on the object can be equally challenging, but it is still feasible to identify some natural features on the object as GCPs [
27]. This is also required during installation to determine the correct georeferencing of the monitoring system. In the experiments, only three GCPs were used for this purpose. From a practical point of view, this strategy requires little effort if compared with the much more complex FF calibration, while also providing good results.
Similar considerations arise about the periodic calibration of the cameras, should the monitoring timeframe be extended. In such a case, the protective camera box needs to be removed from its position and then reinstalled, and correctly reoriented, after completing the calibration process, requiring further efforts. Additionally, the manual positioning of the camera box could result in a slightly different, and undesirable, framing than the previous monitoring period.
The MI on-the-job calibration was conducted by considering different numbers of images (between 15 and 42 images for each camera—see
Section 2.3 and
Figure 4). In all the procedures, the camera was rotated in different landscape/portrait orientations and was pointing toward the centre of the object with a convergent pose geometry. In all cases, approximately the same camera-to-object distance as the one between the monitoring positions and the object itself was used. All the calibrations were performed using Agisoft Metashape v. 1.8.2 [
26]. Despite the possible correlation of some of the parameters, the consistency in the captured object and the block geometry (specifically, the distance from the object) during both the calibration and monitoring stages should significantly reduce the potential deformation of the reconstructed 3D model.
2.2.3. Point Cloud-Based on-the-Job Calibration—PC
To improve the quality of calibration and ensure good parameter decoupling, using a larger number of GCPs in the BBA could be considered, instead of increasing the number of images composing the calibration photogrammetric block and expanding their spatial distribution (as in the previous section).
However, determining GCPs in the form of well-recognizable natural features on the object using traditional topographic techniques could require considerable effort. To overcome this issue, a new implementation of an aerial triangulation 3D model-controlled procedure is here proposed. In 1988, [
28] as well as [
29] were the first to discuss the use of DEMs as additional or exclusive control data in image block orientation. More recently, other approaches, indirectly orienting images by comparing the image-derived 3D model with a reference one, have been proposed in [
30,
31] in 2008 and in [
32] in 2018.
The procedure proposed here determines points potentially assimilable to GCPs dynamically (i.e., at each iteration of the bundle block adjustment) by identifying them on a reference 3D mesh or point cloud with a KD-tree nearest point search. In more detail, the orientation routine, at the end of each BBA iteration, extracts the current estimate of all the 3D points (i.e., the tie points) and searches the nearest point on the reference point cloud or the projection on the nearest triangle in the reference mesh. For each coordinate of every tie point, a pseudo-observation is added to the estimation system to reduce its difference from the corresponding coordinate of the nearest point on the reference surface. This is done only if the distance from the reference element is below a preset threshold, filtering out potential outliers. As the BBA iterations progress, these pseudo-observations are assigned progressively higher weights. The procedure is implemented by leveraging the capabilities of the Ceres Solver library [
33].
The strategy, in this case, aims at calibrating the cameras using just the two images acquired by the monitoring system and a reference surface model or point cloud, that can be obtained, for instance, using a TLS (
Figure 5).
If the object geometry does not change over time or just small, localized changes can be expected (e.g., due to rockfalls on a slope), the reference model can be used for calibrating (and orienting) the monitoring system for several periods. This is probably the most appealing feature (from a practical point of view), since within this strategy, the monitoring stations do not need to be dismantled for calibration and can operate continuously without service interruptions. If an object surface survey needs to be repeated over time, it can still be rapidly done with suitable equipment (for example, with a TLS). It is worth noting that the procedure requires the photogrammetric image block and the reference surface to align in the same reference system: this allows the point search algorithm to identify matches and the orientation BBA to converge. In other words, a set of at least three GCPs surveyed on-site or extracted from the reference surface must be provided for the initial calibration procedure. Most likely, the orientation provided by the initial calibration obtained during installation is sufficient as a first approximate solution to make the BBA converge for all subsequent periods. The use of additional proper GCPs (i.e., points surveyed on-site and identified on the images) should not necessarily improve the quality of the calibration, since many (if not all) of the tie points extracted act as GCPs in the BBA. To verify this assumption, three different GCP configurations were tested in the experiments for this strategy: (i) more than 3 GCPs; (ii) 3 GCPs, which is the minimum number of points required to correctly perform absolute orientation and consequently georeference the monitoring system; and (iii) no GCPs at all. The same configurations were tested for the next calibration strategy.
2.2.4. Self Calibration—SC
The last calibration strategy tested in this study, which is also considered the weakest in terms of accuracy, consists of just using the images coming from the monitoring system. It is, once again, an on-the-job calibration with an image block consisting of only a few (i.e., for the experiments presented, two) images: each image with its own calibration parameters. To distinguish the approach from the strategy proposed in
Section 2.2.2., it will be referred to in the following as self calibration (SC), emphasizing the fact that the system is calibrated using exclusively the data acquired for monitoring. In this context, it is crucial to have at least a good number of GCPs for accurate calibration. However, without leveraging the data collected for the PC strategy, it is generally unpractical to identify and measure many natural features.
Therefore, for this type of calibration tests, three amounts of GCPs were tested to evaluate their influence on the results: (i) more than 3 GCPs, considered as a scenario likely applicable in a real-world setting without excessive efforts; (ii) 3 GCPs, which is the minimum number of points required to correctly perform absolute orientation and consequently georeference the monitoring system; and (iii) no GCPs at all. The last two scenarios are motivated by the fact that during monitoring a natural feature, when a new calibration is required, it is not guaranteed that the GCPs identified during the installation phase will still be visible and/or recognizable and still occupy the same positions at a later stage of the monitoring. Let us consider, for example, the monitoring of a landslide where, very likely, all points observed on the object tend to move over time. In the case of movements of the object, the possibility of repeating the control survey was also considered in the previous strategy: it is important to note that the on-site operations and the subsequent GCP identification on the images require a greater effort compared to the previous case, which was, instead, highly automated or automatable. All the analytical calibrations were performed using Agisoft Metashape v. 1.8.2 [
26].
As previously emphasized, the main issue in calibrating image blocks with such weak geometric characteristics resides in the potential occurrence of strong correlations among parameters affecting the solution. This may lead to systematic effects (e.g., model deformation) during the 3D reconstruction and, hence, a substantial loss of accuracy. The presence of a larger number of parameters, corresponding to a higher degree of freedom of the BBA resolution system, worsens the phenomenon. Potentially, if the optical–geometric characteristics of the cameras were approximately the same (i.e., calibration parameters did not significantly differ), it would be possible to reduce the problem of parameter correlation by estimating a single camera model for all the monitoring cameras. This calibration method, hereinafter referred to as “Single” to indicate the implementation of a single camera model in the calibration process, was tested for both this and the previous (
Section 2.2.3) strategies.
2.3. Test Sites
To evaluate the calibration strategies and assess their actual performances, two test sites were considered.
The first (indicated as Site 1) was an abandoned sandstone quarry located at Pilkington Reserve in Newcastle (NSW, Australia). The exposed rock face extended approximately 80 m in length and had an average height of about 6 m (
Figure 6). Both a 50 mm and 85 mm lens were considered in the experiments. The area framed by the cameras was approximately 6.5 m high, and 41 and 28 m long for the 50 and 85 mm lenses, respectively.
The monitoring system was positioned on tripods facing the wall, with the cameras located 73 m away from the wall and spaced at a base length of 18 m in a slightly convergent geometric arrangement.
The level of detail obtainable with each system setup depends on the Ground Sampling Distance (GSD), which can be calculated as follows:
where
is the object distance,
is the sensor pixel size (for the Nikon D850, this corresponds to 4.36 µm/pixel), and
is the expected principal distance of the camera, which can be approximated to the focal length of the optics.
The expected depth accuracy
(i.e., the precision along the average optical axis direction of the two cameras) can be estimated by the following equation:
where
is the measurement precision of the image coordinates (assumed to be equivalent to ±0.5 pixels) and
is the base length (i.e., distance between the two cameras).
The resulting GSDs for Site 1, calculated using Equation (1), were approximately 6.3 and 3.7 mm for the 50 and 85 mm lenses, respectively. The depth accuracy (one sigma), as per Equation (2), was estimated to be around 12.5 mm (equivalent to ca. 2 times the GSD) and 7.4 mm.
Note that, in the following analysis, all the results will be presented both in metric (mm) and GSD proportional units, so that relevant results can be easily transferred to other image blocks with similar geometric configurations (i.e., 0.25 < B/Z < 0.4) but with different distances from the object.
A total of 13 coded targets were attached to the rock face and used as GCPs. The targets were surveyed using the reflectorless mode on a Leica MS60 total station. The Leica MS60 is a hybrid instrument of total station and laser scanner and was also used to produce a point cloud of the rock face with more than 1.1 million points, which corresponds to approximately 4100 per m2 (one point every 1.5 cm on average). The expected accuracy of the surveyed points is approximately 4 mm.
The second test site (in the following indicated as Site 2) was located in a mine site in the Hunter Valley (NSW, Australia). The observed rock face (
Figure 7) measured approximately 29 m in height and 35.6 m in width. The calibration data used in the present experiments were acquired on 20 May 2022. In this site, only the 85 mm optics were tested.
The two camera units were positioned at a distance of approximately 87 m from the wall. The cameras had a base length of 32.6 m and were set up in a slightly convergent arrangement to maximize overlap. The GSD was 4.5 mm, while the depth accuracy (one sigma) was estimated at 6 mm. A total of 32 GCPs were surveyed on the rock face using a Leica MS60.
To obtain a reference surface to be used as ground truth, a Leica P40 TLS was used, acquiring a point cloud with more than 3.8 million points (ca. 3700 per m2 or 1 point every 1.6 cm). The expected accuracy of the surveyed points, according to the equipment’s technical specifications, is approximately 5 mm.
In both sites, a reflector prism was installed on each camera unit to precisely measure its position with the total station.
2.4. Data Processing
All calibration procedures employed the BBA routines within Agisoft Metashape v. 1.8.2 [
26], except for the PC strategy as discussed in
Section 2.2.3. PC utilizes an in-house code based on the Ceres library [
33]. Nevertheless, it is important to highlight that PC implements the same analytical camera model as Metashape. This ensures perfect interoperability of results and calibration parameters between the two solutions.
The calibration strategies utilized different numbers of GCPs (see
Table 1): FF utilized all 70 coded targets on the test field as GCPs, while MI used only three GCPs. For strategies utilizing only the two installed monitoring camera stations (i.e., PC and SC), in addition to the three GCPs used for image block control, as in MI, calibration solutions without GCPs and with a higher number of GCPs (9 for the 50 mm lens, and 7 and 13 for the 85 mm lens at Sites 1 and 2, respectively) were considered to assess whether an increased effort in site surveying could improve calibration results. The calibration utilised a Brown–Conrady [
20] frame camera model. It was decided to include parameters corresponding to the first three terms of the radial distortion expansion (K1, K2, and K3) and the two parameters for tangential distortion (P1 and P2) while ignoring affine parameters (B1 and B2) and any additional distortion parameters. Especially for the least redundant strategy (i.e., SC), the increased degree of freedom of the solution would probably lead to stronger parameter correlation and worse results.
Upon completion of the calibration procedure, the interior camera parameters obtained were exported and utilized in an identical image block for all tests conducted at the same test site. This image block comprised two images only, with camera centre coordinates accurately measured following the procedure described in [
10]. This additional constraint (known coordinates of the camera centre) was utilized as additional ground control. As far as GCP configuration is concerned, the image block had only three control points, positioned at the extreme borders of the object. A final BBA was conducted with all parameters fixed to optimize the orientation solution.
For each block, dense matching was performed using Metashape utilizing the “high-quality” setting, which, in the software terminology, means that images were down-sampled to half their original size during the image matching process. Interpolation was disabled to avoid incorrect surface reconstruction in correspondence with holes, and no decimation was applied to mesh triangles to preserve all the reconstructed faces. In the depth filtering stage, employing the “aggressive” setting, the matching algorithm filtered individual pixels of the depth maps, eliminating those exhibiting different behaviour (i.e., parallaxes) compared to their local neighbourhood. This aggressive approach aimed to filter depth map pixels more frequently to eliminate potentially noisy elements, albeit at the risk of occasionally removing fine details of the reconstruction.
Subsequently, the resulting dense point cloud was exported for comparison with the reference model: for each test site, a ground truth TLS point cloud was acquired (see
Section 2.3 for details) and then imported and meshed in CloudCompare [
34] using the Poisson Recon Plugin.
The comparison stage encompassed two phases: initially, the photogrammetric point cloud was aligned with the reference TLS mesh through an iterative closest point procedure [
35]. This alignment aimed at mitigating or eliminating small systematic translations or rotations that could arise during the orientation stage. Across all tested scenarios, the ICP registration consistently converged within a few iterations (typically 5 to 10 iterations), with final registration residuals closely mirroring the initial ones.
Subsequently, the registered point cloud underwent comparison with the reference DSM using a point-to-mesh algorithm. Specifically, CloudCompare’s cloud-to-mesh (C2M) distance calculation algorithm was employed to ascertain the distance between the two models. Finally, each single point distance to the nearest triangle on the reference ground truth mesh was saved. An automated routine then analysed the distance distribution and computed the average, standard deviation, and Root Mean Square Error (RMSE) of the distances, excluding those points that were too distant from the reference surface. In other words, based on the expected accuracy estimated in
Section 2.3, a threshold equal to four times the expected depth precision of ground points was set to filter out possible outliers. The total count of the points in the final reconstruction and the number of the filtered ones were also computed and can be used as an indicator of the completeness of the reconstructed surface.
2.5. Experimental Program
The calibration procedures introduced in
Section 2.2 were rigorously tested at Site 1 using both 50 and 85 mm lenses and at Site 2 using the 85 mm lenses. Summarizing
Section 2.1,
Section 2.3, and
Section 2.4, for each of the monitoring configurations (i.e., Site 1–50 mm optics; Site 1–85 mm; and Site 2–85 mm), several calibration strategies were considered:
full-field calibration (FF)
multi-image on-the-job calibration (MI)
point cloud-based on-the-job calibration (PC)
self calibration (SC).
For the PC and SC strategies, both the influence of using a different number of GCPs supporting the calibration procedure (7/9/13, depending on the site and optics used, 3 GCPs, or 0 GCPs), and the use of a “Single” unique and identical camera model for both the cameras (these calibrations are called PCS and SCS, respectively) was evaluated. For all the other strategies, 3 GCPs and separate camera model parameters for the two cameras were considered. A summary of the calibration strategies for Site 1 and Site 2 is reported in
Table 1.
The calibration parameters estimated in each configuration were used in a fixed stereo image block with 3 GCPs and a known camera position as an additional control to obtain a point cloud via dense matching, which was compared with a ground truth TLS-acquired reference mesh. The quality of each configuration was evaluated in terms of spatial and statistical distribution of the differences (i.e., distances) between the photogrammetric point cloud and the ground truth. Since most of the distributions were substantially Gaussian, the average and the RMSE values were considered representative of the distribution itself, along with the number of samples (i.e., points) of the whole point cloud whose distance from the reference surface was not higher than four times the expected depth precision (i.e., the one computed with Equation (2)).