LiDAR-Aided Interior Orientation Parameters Reﬁnement Strategy for Consumer-Grade Cameras Onboard UAV Remote Sensing Systems

: Unmanned aerial vehicles (UAVs) are quickly emerging as a popular platform for 3D reconstruction / modeling in various applications such as precision agriculture, coastal monitoring, and emergency management. For such applications, LiDAR and frame cameras are the two most commonly used sensors for 3D mapping of the object space. For example, point clouds for the area of interest can be directly derived from LiDAR sensors onboard UAVs equipped with integrated global navigation satellite systems and inertial navigation systems (GNSS / INS). Imagery-based mapping, on the other hand, is considered to be a cost-e ﬀ ective and practical option and is often conducted by generating point clouds and orthophotos using structure from motion (SfM) techniques. Mapping with photogrammetric approaches requires accurate camera interior orientation parameters (IOPs), especially when direct georeferencing is utilized. Most state-of-the-art approaches for determining / reﬁning camera IOPs depend on ground control points (GCPs). However, establishing GCPs is expensive and labor-intensive, and more importantly, the distribution and number of GCPs are usually less than optimal to provide adequate control for determining and / or reﬁning camera IOPs. Moreover, consumer-grade cameras with unstable IOPs have been widely used for mapping applications. Therefore, in such scenarios, where frequent camera calibration or IOP reﬁnement is required, GCP-based approaches are impractical. To eliminate the need for GCPs, this study uses LiDAR data as a reference surface to perform in situ reﬁnement of camera IOPs. The proposed reﬁnement strategy is conducted in three main steps. An image-based sparse point cloud is ﬁrst generated via a GNSS / INS-assisted SfM strategy. Then, LiDAR points corresponding to the resultant image-based sparse point cloud are identiﬁed through an iterative plane ﬁtting approach and are referred to as LiDAR control points (LCPs). Finally, IOPs of the utilized camera are reﬁned through a GNSS / INS-assisted bundle adjustment procedure using LCPs. Seven datasets over two study sites with a variety of geomorphic features are used to evaluate the performance of the developed strategy. The results illustrate the ability of the proposed approach to achieve an object space absolute accuracy of 3–5 cm (i.e., 5–10 times the ground sampling distance) at a 41 m ﬂying height.


Introduction
Unmanned aerial vehicles (UAVs) equipped with integrated global navigation satellite systems and inertial navigation systems (GNSS/INS) are gaining popularity for many applications due to their map accuracy. Lab calibration was conducted using a 3D test field with approximately 500 coded and non-coded targets, while the in situ calibration utilized 22 GCPs over the study site. Among the presented results with different configurations, the in situ calibration with the combination of two nadir blocks in a cross-flight configuration with slightly different flying heights provided the best accuracy based on a root-mean-square error (RMSE) analysis of the check points. Our review of the existing literature reveals that the majority of the proposed in situ calibration techniques require establishing GCPs. Establishing such ground targets is expensive and labor intensive, and more importantly, the distribution and number of GCPs are usually less than optimal to provide adequate control for determining and/or refining the camera IOPs.
Current research aims at providing an IOP refinement strategy without the need for GCPs. In the proposed approach, instead of established ground targets, a LiDAR point cloud is used as a reference surface to refine camera IOPs. More specifically, the proposed strategy consists of three main steps: (i) image-based sparse point cloud generation via a GNSS/INS-assisted SfM strategy, (ii) iterative plane fitting approach for the identification of LCPs corresponding to the generated image-based sparse point cloud, and (iii) refinement of camera IOPs using a GNSS/INS-assisted BA procedure with the help of the derived LCPs in step (ii).
The remainder of the paper is organized as follows: Section 2 focuses on related studies that use LiDAR data for camera calibration. Section 3 shows the UAV-based MMS and datasets used in this study, and Section 4 describes the utilized approaches for deriving image-based and LiDAR point clouds. A bias impact analysis of camera IOPs is then presented in Section 5 to verify the capability of LiDAR data to serve as a control surface for in situ camera calibration. The proposed IOP refinement strategy as well as the methodology adopted for assessing the quality of refined IOPs are introduced in Section 6. Finally, Section 7 presents the experimental results and Section 8 provides conclusions and recommendations for future work.

Related Work
The availability of LiDAR sensors in recent years has prompted some researchers to investigate the feasibility of using LiDAR point clouds for the in situ calibration of digital cameras. Gneeniss et al. [26] developed an in situ camera calibration and validation procedure using LiDAR data for an airborne mobile mapping system. In their approach, LiDAR control points (LCPs) were determined through a least squares surface matching between the photogrammetric and LiDAR points. More specifically, camera IOPs were refined in four steps: (i) an image-based point cloud was generated through an integrated sensor orientation (ISO) procedure, (ii) the image-based point cloud was then registered to a LiDAR surface, (iii) LCPs were automatically extracted from planar surfaces, and (iv) camera parameters were finally refined using the derived LCPs and GNSS/INS trajectory in an aerial triangulation process. The results showed that by using self-calibration with a large number of LCPs, similar results to those obtained using GCPs can be achieved. This showed that the approach can eliminate the need for establishing and maintaining expensive calibration test fields. A main limitation of this approach is that no investigation on the plane surface orientation was conducted to assign appropriate weights to the X, Y, and Z components of LCPs in the self-calibration process, i.e., similar weights were used for all components. Incorporating an appropriate weighting procedure is important because a lower accuracy is expected for the X and Y coordinates of LCPs on flat terrain when compared with the Z coordinate. Moreover, the derived LCPs do not have a good spatial variation, i.e., surfaces with a maximum slope angle of 10 • are used for LCP generation. This may lead to a high correlation between the estimated intrinsic/extrinsic camera parameters, but no report on the correlation between camera parameters was presented in the paper.
Mitishita et al. [22] also proposed a procedure for the in situ calibration of off-the-shelf digital cameras integrated with a LiDAR sensor onboard an aircraft. In their work, the in situ calibration experiments were performed through a bundle adjustment process using two basic configurations of control points. In the first configuration, 12 well-distributed GCPs in a 4 km 2 area were used for refining the camera parameters and the results were considered as a reference, while the second configuration exploited vertical control points from LiDAR point clouds for refining camera IOPs in an in situ self-calibration procedure. According to the results, the configuration with LiDAR-based vertical control points provided very similar IOP values to those which were considered as a reference using GCPs. However, the calibration procedure was not able to estimate accurate camera orientation parameters, resulting in an inaccurate object space, especially along the horizontal direction. To overcome this limitation, the authors suggested using at least two GCPs at the beginning and end of the block along with the LiDAR-based control points. Other than requiring GCPs, ignoring horizontal information from LiDAR points as well as using a simple approach (nearest neighbor interpolation) for deriving LiDAR-based control points are among the limitations of the proposed in situ calibration strategy.
In a similar work, Costa et al. [27] studied the integration of LiDAR and photogrammetric data through indirect georeferencing and in situ camera calibration. In their study, LiDAR points were initially used as control points in a bundle adjustment procedure to refine camera IOPs and then were exploited for indirect georeferencing. The LiDAR points were derived from the intersection of building roof planes through least square adjustment. The results showed that the indirect georeferencing using camera IOPs based on LCPs achieved better horizontal and vertical accuracy when compared with results from indirect georeferencing using IOPs from an indoor calibration procedure. The implemented approach for deriving LCPs in this study results in a very small number of control points, which are not optimal for reliably refining camera IOPs. More importantly, indirect georeferencing-based in situ calibration suffers from high correlation among estimated camera IOPs and EOPs, which in turn can lead to inaccurate estimation of camera parameters.
In this study, a new camera IOP refinement strategy is developed in order to overcome the above-mentioned limitations in the existing body of literature. Figure 1 illustrates the workflow of the proposed approach, which consists of four steps, namely sparse point cloud generation via GNSS/INS-assisted SfM, distance-based down-sampling of the derived sparse point cloud, LCP generation, and IOP refinement through GNSS/INS-assisted BA. Contributions of this study can be summarized as follows: • Confirming the capability of LiDAR data to serve as a good source of control for IOP refinement in two aspects: (i) verifying the stability of the LiDAR system and the quality of reconstructed point clouds over time, and (ii) ensuring that LiDAR data, even over sites with predominantly horizontal surfaces or ones with mild slopes, are sufficient to refine camera IOPs; • Developing a strategy that can use LiDAR data over any type of terrain to accurately refine camera IOPs without requiring any targets laid out in the surveyed area or any specific flight configuration as long as sufficient overlap and side-lap among images are ensured.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 25 horizontal field of view and a 30° vertical field of view (±15° from the horizon, i.e., direction perpendicular to the axis of rotation). The maximum measurement range is 100 m with a ±3 cm range accuracy [31].  As mentioned earlier, rigorous system calibration is essential for achieving accurate mapping products through direct georeferencing. In this study, the United States Geological Survey (USGS) simultaneous multi-frame analytical calibration (SMAC) [32] distortion model is used to provide initial estimates of the camera IOPs through an indoor calibration procedure similar to the one proposed by Habib and Morgan [21]. Then, the principal distance of the camera is refined and mounting parameters, i.e., boresight angles and lever arm components, between the GNSS/INS unit and onboard imaging/ranging sensors are estimated through the in situ system calibration procedure proposed by Ravi et al. [33] using a calibration dataset collected on November 11, 2019. The absolute

Data Acquisition System Specifications and Dataset Description
In this section, the platform and sensors used in this study, including sensor specifications and system calibration strategy, are introduced. Then, the acquired datasets that are used to evaluate the performance of the proposed IOP refinement strategy are described.

Data Acquisition System
In this study, imagery and LiDAR data are captured by a custom-built UAV mobile mapping system, as shown in Figure 2. The system consists of a Dà-Jiāng Innovations (DJI) Matrice 600 Pro (M600P) carrying a Sony α7R III (ILCE-7RM3) RGB camera, a Velodyne Puck Lite LiDAR sensor, and an Applanix APX-15 UAV v3 GNSS/INS unit. The RGB camera, LiDAR, and GNSS/INS unit are rigidly fixed relative to one another. Direct georeferencing information, i.e., the position and orientation of the system, is provided by the APX-15 v3 unit at a 200 Hz data rate. After post-processing the GNSS/INS data, the expected positional accuracy is 2-5 cm, and the accuracy for pitch/roll and heading is 0.025 • and 0.08 • , respectively [28]. The Sony α7R III camera is a 42-megapixel camera with a 7952 × 5304 complementary metal oxide semiconductor (CMOS) array, 4.5 µm pixel size, and a lens with a 35 mm nominal focal length [29]. The camera is triggered at a frame interval of 1.5 s, and an event marker for each image is recorded in the GNSS/INS trajectory through a feedback signal from camera to the GNSS/INS unit using the former's flash hotshot [30]. The Velodyne Puck LITE, with a weight of 590 g, is a lighter version of the VLP-16 Puck (830 g) and consists of 16 channels. This sensor generates approximately 300,000 points per second with a 360 • horizontal field of view and a 30 • vertical field Remote Sens. 2020, 12, 2268 6 of 26 of view (±15 • from the horizon, i.e., direction perpendicular to the axis of rotation). The maximum measurement range is 100 m with a ±3 cm range accuracy [31].  As mentioned earlier, rigorous system calibration is essential for achieving accurate mapping products through direct georeferencing. In this study, the United States Geological Survey (USGS) simultaneous multi-frame analytical calibration (SMAC) [32] distortion model is used to provide initial estimates of the camera IOPs through an indoor calibration procedure similar to the one proposed by Habib and Morgan [21]. Then, the principal distance of the camera is refined and mounting parameters, i.e., boresight angles and lever arm components, between the GNSS/INS unit and onboard imaging/ranging sensors are estimated through the in situ system calibration procedure proposed by Ravi et al. [33] using a calibration dataset collected on November 11, 2019. The absolute As mentioned earlier, rigorous system calibration is essential for achieving accurate mapping products through direct georeferencing. In this study, the United States Geological Survey (USGS) simultaneous multi-frame analytical calibration (SMAC) [32] distortion model is used to provide initial estimates of the camera IOPs through an indoor calibration procedure similar to the one proposed by Habib and Morgan [21]. Then, the principal distance of the camera is refined and mounting parameters, i.e., boresight angles and lever arm components, between the GNSS/INS unit and onboard imaging/ranging sensors are estimated through the in situ system calibration procedure proposed by Ravi et al. [33] using a calibration dataset collected on November 11, 2019. The absolute accuracy of the LiDAR and image-based points is in the range of ±2 to ±5 cm compared to established GCPs. Moreover, a standard in situ self-calibration with additional control provided by GCPs is conducted to evaluate and refine the camera IOPs based on a dataset collected on February 12, 2020, where the flying height was 40 m and the auto focus mode was used [34]. The achieved absolute accuracy of the image-based point cloud is within ±3 cm in the X, Y, and Z directions.

Dataset Description
In this study, seven datasets were collected over two study sites, denoted as Site I and Site II, at Purdue's Agronomy Center for Research and Education (ACRE), as shown in Figure 3. Site I is an area consisting of different geomorphic features, i.e., grass, pavement, and building roof, with variations in elevation (up to 9-10 m), while Site II is over a relatively flat grassy area with a maximum of 2-3 m variations in elevation. To evaluate the absolute accuracy of the reconstructed point clouds, a total of twelve highly reflective checkerboard targets were deployed in the study sites (red boxes in Figure 3a,b). Coordinates of the centers of these checkerboard targets were determined using a real-time kinematic (RTK)-GNSS survey with a Trimble R10 GNSS receiver. Considering the distance between the rover receiver and the GNSS base-station, i.e., 6 km, the expected horizontal and vertical accuracy from the R10 is in the range of ±2 to ±3 cm and ±3 to ±4 cm, respectively [35]. One should note that the checkerboard targets are only used for accuracy verification of the proposed approach and are not required for conducting the IOP refinement procedure. Table 1 summarizes the flight configurations and camera settings for the different datasets. The ground sampling distance (GSD) of the imagery is 0.6 cm at a 41 m flying height. Different focus modes were selected for the Sony camera, i.e., auto focus and manual focus. Although using auto focus mode might lead to variations in the IOPs, the expected variations would be quite minimal given that the camera to object distance is very large, leading to focus at infinity. Consequently, IOPs are expected to remain the same during data acquisition (which will be verified in Section 7). As for the manual focus setting, two different focus distances-32 and 41 m-were utilized. Considering that the flying height is with respect to the ground and there are other objects at varying heights (e.g., roof with an average height of 9-10 m) in Site I, the camera mainly focuses on the roof and ground under 32 and 41 m focus distance settings, respectively. As reported in Table 1, the six datasets over Site I Remote Sens. 2020, 12, 2268 7 of 26 are named as A, B, and C, based on the utilized camera settings, i.e., auto focus, manual focus at 32 m focus distance, and manual focus at 41 m focus distance, and the dataset over Site II is denoted as D. A single mission plan was designed using the DJI GS Pro software for all datasets over Site I. Figure 4a shows a top view of the flight trajectory colored by time for the A-1 dataset. The flight mission included twelve east-west flight lines followed by three north-south flight lines (the duration of the mission was approximately 5-6 minutes). As for the D dataset, the mission plan consists of ten east-west flight lines (shown in Figure 4b). It is worth noting that the utilized flight trajectory is not the suggested flight configuration for the proposed strategy, but any suitable flight plan can be used as long as sufficient overlap and side-lap among the images are ensured.
focus mode might lead to variations in the IOPs, the expected variations would be quite minimal given that the camera to object distance is very large, leading to focus at infinity. Consequently, IOPs are expected to remain the same during data acquisition (which will be verified in Section 7). As for the manual focus setting, two different focus distances-32 and 41 m-were utilized. Considering that the flying height is with respect to the ground and there are other objects at varying heights (e.g., roof with an average height of 9-10 m) in Site I, the camera mainly focuses on the roof and ground under 32 and 41 m focus distance settings, respectively. As reported in Table 1, the six datasets over Site I are named as A, B, and C, based on the utilized camera settings, i.e., auto focus, manual focus at 32 m focus distance, and manual focus at 41 m focus distance, and the dataset over Site II is denoted as D. A single mission plan was designed using the DJI GS Pro software for all datasets over Site I. Figure 4a shows a top view of the flight trajectory colored by time for the A-1 dataset. The flight mission included twelve east-west flight lines followed by three north-south flight lines (the duration of the mission was approximately 5-6 minutes). As for the D dataset, the mission plan consists of ten east-west flight lines (shown in Figure 4b). It is worth noting that the utilized flight trajectory is not the suggested flight configuration for the proposed strategy, but any suitable flight plan can be used as long as sufficient overlap and side-lap among the images are ensured.

Image and LiDAR-Based Point Cloud Generation
In this section, the utilized approaches for image-based sparse and dense point cloud generation are first described. In this study, the former, along with the corresponding image tie points, is used for IOP refinement, while the latter is utilized in camera IOP accuracy analysis. Further, the LiDAR point cloud reconstruction approach is briefly introduced.
In this study, an image-based sparse point cloud is generated through the GNSS/INS-assisted SfM strategy proposed by Hasheminasab et al. [36]. This strategy takes advantage of the available GNSS/INS trajectory to facilitate the 3D reconstruction process and is conducted in four steps: stereo-image matching using the scale invariant feature transform (SIFT) [37] algorithm, relative orientation parameter (ROP) estimation, exterior orientation parameter (EOP) recovery, and bundle adjustment (BA). In the utilized SfM strategy, the GNSS/INS trajectory is used to reduce the search space in the stereo image matching step. As a result, matching performance can be improved when dealing with images captured over areas with homogeneous nature, such as patches of grass, pavement, and building roofs. Further, the smaller difference of Gaussian (DoG) threshold of the SIFT algorithm can be applied to increase the number of extracted features and consequently the number of matches without introducing many matching outliers. Compared with available commercial software (e.g., Agisoft Photoscan and Pix4D) for image-based 3D reconstruction, the SfM strategy utilized in this work offers several advantages in terms of removing matching outliers, incorporating GNSS/INS information as prior information in BA, conducting system calibration, and having access to intermediate results.
More details regarding the first three steps of the implemented SfM framework can be found in [36,38].
In the last step of the SfM framework, a GNSS/INS-assisted bundle adjustment is conducted to generate the final sparse point cloud using the modified collinearity equations [36] while involving the GNSS/INS trajectory and mounting parameters relating the GNSS/INS body frame to the camera frame. The camera distortions are based on the USGS SMAC distortion model where radial and decentering lens distortions are considered [32]. According to the SMAC model, the corrections for the lens distortion are computed using the radial distortion coefficients (K 0 , K 1 , K 2 , K 3 ) and decentering distortion coefficients (P 1 , P 2 , P 3 ). Since K 0 is highly correlated with the principal distance c and K 3 is considered only for cameras with a large angular field of view, only radial distortion coefficients K 1 and K 2 are taken into account in this study. Similarly, the decentering distortion coefficient P 3 is ignored as the lens misalignment of the utilized camera is not significant. The GNSS/INS position and orientation information is included in the BA model through pseudo-observations, i.e., direct observations of the unknowns. The variances of these unknowns are determined according to the accuracy specifications of the utilized GNSS/INS unit, i.e., 3 cm for position, 0.025 • for roll/pitch angles, and 0.08 • for heading angle. A non-linear least squares adjustment (LSA) is conducted to refine the following parameters: (i) 3D coordinates of the object points, (ii) GNSS/INS trajectory, and/or (iii) system calibration parameters. It is worth noting that the impact of remaining matching outliers on the LSA will be quite minimal due to the large redundancy provided by the SIFT-based tie points. Besides the reconstructed sparse point cloud, an orthophoto for the area of interest can be generated using the involved imagery as well as the output of the BA procedure for a visual illustration of the mapped area. Figure 5a,b shows a sample of a generated orthophoto and corresponding image-based sparse point cloud for the A-1 dataset, respectively. As can be seen in Figure 5b, the reconstructed image-based 3D points (20,000 points) are not well-distributed over the study site, i.e., most points belong to the grassy area while very few points are reconstructed on the building roof and pavement surfaces. In order to generate a well-distributed image-based dense point cloud, a dense matching strategy similar to the patch-based multi-view stereo (PMVS) algorithm proposed by Furukawa and Ponce [39] is implemented. Figure 5c illustrates the generated image-based dense point cloud for the A-1 dataset. Comparing Figure 5c with Figure 5b, one can observe the improvement in the distribution as well as the number of points in the generated dense point cloud. In addition, the LiDAR point cloud is generated through the point positioning equation utilizing the range and orientation of the laser beams, position and orientation of the GNSS/INS unit, and the mounting parameters relating the GNSS/INS unit and laser unit frame [40]. A sample LiDAR point cloud for the A-1 dataset is shown in Figure 5d.

Bias Impact Analysis for Camera IOPs
Assuming that a LiDAR system is stable over time (which will be verified in Section 7), the sufficiency of LiDAR data, even over sites with predominantly horizontal surfaces or ones with mild slopes, as a source of control for IOP refinement needs to be investigated. To do so, bias impact analysis is conducted to show the deformations in the reconstructed object space due to the presence of a bias in the camera IOPs. Based on the impact analysis of different camera parameters, the capability of LiDAR to provide control information for decoupling the involved IOPs is evaluated. Although the impact of an erroneous/gradually varying principal distance and rolling shutter effect on the accuracy of 3D reconstruction has been experimentally investigated by Zhou et al. [41], this section will focus on analytically and experimentally analyzing the impact of the principal distance and distortion parameters.

Analytical Bias Impact Analysis
The impact of a bias in the principal distance on the coordinates of reconstructed points is illustrated in Figure 6, assuming that the camera maintains a nadir view at a constant flying altitude. As shown in this figure, changes in the principal distance only affect the Z coordinate of the object point. The impact on the Z coordinate is also mathematically derived by Equation (1). In this equation, the altitude of the object point, h, is represented as a function of the flying altitude, H, length of base line, B, principal distance, c, and x-parallax, P x , between conjugate image features. Using Equation (1), the impact of a bias in the principal distance can be derived through partial derivatives with respect to that parameter, as given by Equation (2). One can conclude that the impact of variation in the principal distance depends on the flying height (H − h), and a positive variation δc will lead to a negative change in the Z coordinate of the object points. As for the impact analysis of distortion parameters, analytical analysis would be infeasible due to the inherent complexity of the mathematical model. Therefore, such impact analysis is experimentally performed using a real dataset.

Analytical Bias Impact Analysis
The impact of a bias in the principal distance on the coordinates of reconstructed points is illustrated in Figure 6, assuming that the camera maintains a nadir view at a constant flying altitude. As shown in this figure, changes in the principal distance only affect the Z coordinate of the object point. The impact on the Z coordinate is also mathematically derived by Equation (1). In this equation, the altitude of the object point, ℎ, is represented as a function of the flying altitude, , length of base line, , principal distance, , and x-parallax, , between conjugate image features. Using Equation (1), the impact of a bias in the principal distance can be derived through partial derivatives with respect to that parameter, as given by Equation (2). One can conclude that the impact of variation in the principal distance depends on the flying height ( ℎ), and a positive variation will lead to a negative change in the Z coordinate of the object points. As for the impact analysis of distortion parameters, analytical analysis would be infeasible due to the inherent complexity of the mathematical model. Therefore, such impact analysis is experimentally performed using a real dataset.

Experimental Bias Impact Analysis
The experimental bias impact is illustrated by artificially introducing a bias into each parameter of interest, and evaluating the impact on the reconstructed point cloud. Other than distortion parameters, the principal distance of the camera is also considered to validate the outcome of analytical derivation. One should note that distortion coefficients related to the same type of distortion (either radial or decentering lens distortion), i.e., 1/ 2 and 1/ 2, exhibit a similar Figure 6. Impact of a bias in the principal distance (δc) on object point coordinates.

Experimental Bias Impact Analysis
The experimental bias impact is illustrated by artificially introducing a bias into each parameter of interest, and evaluating the impact on the reconstructed point cloud. Other than distortion parameters, the principal distance of the camera is also considered to validate the outcome of analytical derivation. One should note that distortion coefficients related to the same type of distortion (either radial or decentering lens distortion), i.e., K1/K2 and P1/P2, exhibit a similar impact on the image space and consequently object space. Therefore, only principal distance and distortion coefficients K1 and P1 are considered in the experimental bias impact analysis. Table 2 shows the reference set of IOPs, estimated from the in situ self-calibration procedure, and other sets of biased IOPs, derived by artificially introducing a bias to the three parameters (colored in red). The introduced biases are chosen to be large enough to have a distinguishable impact on the object space. To evaluate the impact of each set of camera IOPs on the object space, the GNSS/INS-assisted SfM strategy is first applied on the A-1 dataset using the reference camera IOPs. Then, using each set of biased camera IOPs, a GNSS/INS-assisted bundle adjustment is conducted to derive 3D coordinates of the SIFT-based tie points. Derived coordinates of the tie points using the biased IOPs are compared to the ones coming from the BA involving the reference IOPs. The mean and standard deviation (STD) of differences in the X, Y, and Z coordinates of the object points are reported in Table 3 and illustrated in Figures 7-9, where a scale bar for the color scheme is included for each figure. Table 2. Reference and biased IOPs used in the bias impact analysis (biased parameters are colored in red).
x p (pixel)  Table 3. Mean and standard deviation (STD) of the differences in object point coordinates caused by a bias introduced to the reference camera IOPs for the A-1 dataset. spherical deformation pattern is observed for the Z coordinate, as shown in Figure 8c. More specifically, Z differences change from -8 to 2 cm from the center to the periphery of the study site. Shifts/deformations caused by a bias in the decentering distortion coefficient, 1, are shown in Table 3 and Figure 9. According to the reported shifts/deformations in Table 3, one can see that a scaling issue is manifested by small mean values and larger STD values. In Figure 9, systematic deformations can be observed along the X, Y, and Z directions. The deformation pattern is not as obvious as the one caused by the radial coefficient (shown in Figure 8) due to the fact that decentering distortions, which comprise radial and tangential components, result in more complex patterns. Shifts/deformations caused by a bias in the decentering distortion coefficient, 1, are shown in Table 3 and Figure 9. According to the reported shifts/deformations in Table 3, one can see that a scaling issue is manifested by small mean values and larger STD values. In Figure 9, systematic deformations can be observed along the X, Y, and Z directions. The deformation pattern is not as obvious as the one caused by the radial coefficient (shown in Figure 8) due to the fact that decentering distortions, which comprise radial and tangential components, result in more complex patterns. Shifts/deformations caused by a bias in the decentering distortion coefficient, 1, are shown in Table 3 and Figure 9. According to the reported shifts/deformations in Table 3, one can see that a scaling issue is manifested by small mean values and larger STD values. In Figure 9, systematic deformations can be observed along the X, Y, and Z directions. The deformation pattern is not as obvious as the one caused by the radial coefficient (shown in Figure 8) due to the fact that decentering distortions, which comprise radial and tangential components, result in more complex patterns.

Statistics Criteria
(a) (b) (c) Figure 9. Top view of the sparse point cloud colored by (a) X discrepancy, (b) Y discrepancy, and (c) Z discrepancy caused by a bias in the decentering distortion coefficient, P1, for the A-1 dataset.
As presented in Table 3 and Figure 7a,b, the XY-differences in object point coordinates caused by a bias in the focal length (i.e., 20 pixels) are within 1 cm, which are negligible. In terms of the Z-differences shown in Table 3 and Figure 7c, a constant shift can be observed as reflected by the 10 cm mean value and small standard deviation. The Z discrepancy in the flat terrain area is relatively constant, i.e., 10 cm. However, for object points on the building roof, the Z discrepancy decreases to 7 cm. This observation confirms the hypothesis that a variation in the principal distance leads to a shift along the vertical direction, which depends on the object point altitude. Moreover, according to Equation (2) and considering the average flying height of the A-1 dataset (i.e., 41 m), as well as the camera principal distance (i.e., 8030.45 pixels), the impact of −20 pixels variation in the principal distance on the Z coordinate is evaluated as 10.2 cm. Such discrepancy is consistent with the observed Z difference (10 cm) shown in Table 3.
As for the impact of a bias in the radial distortion coefficient, K1, 5 cm STD values in the X and Y directions are observed in Table 3 with mean values equal to zero. Figure 8a,b depicts the X and Y discrepancies caused by the bias in K1. According to these figures, it can clearly be seen that the X differences range from -10 in the west part to 10 cm in the east part of the study site. The same spatial discrepancy pattern is observed for Y differences from south to north. Such patterns indicate a scaling issue introduced in the XY-plane as a result of a bias in radial distortion coefficients. Moreover, a spherical deformation pattern is observed for the Z coordinate, as shown in Figure 8c. More specifically, Z differences change from -8 to 2 cm from the center to the periphery of the study site. Shifts/deformations caused by a bias in the decentering distortion coefficient, P1, are shown in Table 3 and Figure 9. According to the reported shifts/deformations in Table 3, one can see that a scaling issue is manifested by small mean values and larger STD values. In Figure 9, systematic deformations can be observed along the X, Y, and Z directions. The deformation pattern is not as obvious as the one caused by the radial coefficient (shown in Figure 8) due to the fact that decentering distortions, which comprise radial and tangential components, result in more complex patterns.
Based on the bias impact analysis of camera IOPs, one can conclude that the principal distance and radial/decentering lens distortions parameters exhibit different impacts on the reconstructed object points, especially along the Z direction. It should be noted that imagery captured by nadir looking cameras mounted on UAVs results in a 3D reconstruction of points that mainly belong to horizontal planes or tilted planes with mild slopes. Therefore, the corresponding LiDAR points that are identified through a point-to-plane matching procedure provide control information mainly along the vertical direction. Consequently, the in situ collected LiDAR data, which provide vertical control information, can be adequate for decoupling the involved IOPs.

Methodology for IOP Refinement and Validation
This section starts with introducing the proposed IOP refinement strategy. The approach for evaluating the accuracy of the refined IOPs is then presented. Finally, a strategy for evaluating the similarity between two sets of camera IOPs is introduced to compare the refined IOPs from different initial values.

IOP Refinement
Instead of using GCPs for refining camera IOPs, the proposed strategy relies on control information extracted from LiDAR data that are acquired in the same mission. In the first step, the GNSS/INS-assisted SfM is adopted to generate a sparse point cloud and corresponding SIFT tie points using the original camera IOPs. As mentioned in Section 4, rather than the default DoG threshold in the SIFT detector (i.e., 0.007), a smaller threshold (0.003) is used in this study to extract more features and consequently generate more object points. Figure 10a,b shows the generated sparse point clouds using the two DoG thresholds, where using a smaller DoG threshold results in a significant improvement of the reconstruction results in terms of the distribution and number of object points. One should note that despite using a small DoG threshold in the SfM process, the distribution of object points is not ideal as a majority of the reconstructed object points still belong to the grassy area due to the existence of more distinctive features. It is worth noting that an uneven distribution of the control points would cause overweighting that could negatively impact the IOP refinement process [42]. To ensure a balanced and uniform distribution of the object points, in the next step, a distance-based down-sampling is performed on the generated sparse point cloud. In the down-sampling procedure, SIFT tie points corresponding to the eliminated object points are removed as well. A sample of down-sampled object points is shown in Figure 10c for the A-1 dataset. Following the object/image points down-sampling, LCPs are generated by identifying the corresponding LiDAR point for each image-based point. The conceptual basis of the proposed strategy is identifying the projections of image-based points onto neighboring LiDAR surfaces that define planar regions, as illustrated in Figure 11. For a given point in an image-based point cloud (denoted by the blue point in Figure 11a), its closest LiDAR point is identified as a candidate corresponding point, as shown by the green point in Figure 11b. To ensure reliable correspondences, the distance between these two points must be smaller than a pre-defined value. Next, a spherical region with a pre-defined radius centered at the candidate corresponding point is created, as shown in Figure 11b. In this study, 1 and 0.5 m thresholds are used as the maximum distance and search radius, respectively. Then, an iterative plane fitting is conducted using all the LiDAR points within the spherical region, as shown in Figure 11c. Here, the term "iterative" refers to the removal of outliers based on the best-fitting plane from each iteration. In other words, the plane derived in a given iteration is used to identify and remove outlier points followed by a subsequent plane fitting conducted on the remaining points.
A point is regarded as an outlier if its normal distance is more than a user-defined factor-2.5 in this implementation-multiplied by the RMSE of the normal distances of all the points. The remaining inlier points from each iteration are assigned weights that are inversely proportional to their corresponding normal distance from the best-fitting plane. This process is iterated till there are no more outliers and the finally obtained plane is designated as the best-fitting plane for the neighborhood. The plane is considered valid if the RMSE of the normal distances of the points from the best-fitting plane is smaller than a pre-defined threshold (0.3 m in this study), and the ratio of LiDAR points retained through the iterative plane fitting is more than 50%. Figure 11d shows the points retained through the iterative plane fitting. Finally, the projection of the image point onto the derived best-fitting plane is regarded as the final LiDAR point corresponding to the image point, as shown by the green point in Figure 11d.
the distance between these two points must be smaller than a pre-defined value. Next, a spherical region with a pre-defined radius centered at the candidate corresponding point is created, as shown in Figure 11b. In this study, 1 and 0.5 m thresholds are used as the maximum distance and search radius, respectively. Then, an iterative plane fitting is conducted using all the LiDAR points within the spherical region, as shown in Figure 11c. Here, the term "iterative" refers to the removal of outliers based on the best-fitting plane from each iteration. In other words, the plane derived in a given iteration is used to identify and remove outlier points followed by a subsequent plane fitting conducted on the remaining points. A point is regarded as an outlier if its normal distance is more than a user-defined factor-2.5 in this implementation-multiplied by the RMSE of the normal distances of all the points. The remaining inlier points from each iteration are assigned weights that are inversely proportional to their corresponding normal distance from the best-fitting plane. This process is iterated till there are no more outliers and the finally obtained plane is designated as the best-fitting plane for the neighborhood. The plane is considered valid if the RMSE of the normal distances of the points from the best-fitting plane is smaller than a pre-defined threshold (0.3 m in this study), and the ratio of LiDAR points retained through the iterative plane fitting is more than 50%. Figure 11d shows the points retained through the iterative plane fitting. Finally, the projection of the image point onto the derived best-fitting plane is regarded as the final LiDAR point corresponding to the image point, as shown by the green point in Figure 11d. The derived LiDAR points will be used as GCPs in the following BA. One should note that the resultant LCPs will be most reliable in the direction normal to the best-fitting plane, while less reliable in the two directions along the plane, which in turn, warrants the need to associate a weight matrix The derived LiDAR points will be used as GCPs in the following BA. One should note that the resultant LCPs will be most reliable in the direction normal to the best-fitting plane, while less reliable in the two directions along the plane, which in turn, warrants the need to associate a weight matrix to each LCP in the BA step relaying the information about its reliability. To compute the weight matrix, let the local coordinate frame for the best-fitting plane be denoted by uvw, where the u, v-axes are along the plane and the w-axis is along the normal to the plane, as shown in Figure 11d. So, the weight matrix in the uvw-frame, denoted by P uvw , is formed by assigning a standard deviation to each of the u, v, w-directions, denoted by σ u , σ v , σ w , respectively. The standard deviations for the u and v directions are set to 1 m, while that for the w direction (normal to the plane) is considered as 5 cm. Consequently, the weight in the uvw-frame can be given by Equation (3a), which is then transformed to derive the corresponding weight matrix in the mapping frame P xyz using the rotation matrix, R uvw xyz , relating the uvw and the xyz frames (illustrated in Figure 11d), as given in Equation (3b). As a result, for each LCP, its weight matrix is derived based on the orientation of the corresponding best-fitting plane.
Now that the LiDAR-based GCPs are identified, the GNSS/INS-assisted bundle adjustment, which was introduced in Section 4, is carried out to refine the camera IOPs, namely principal distance (c) and distortion parameters (K 1 , K 2 , P 1 , P 2 ). The LCPs along with their weights serve as well-distributed control points, which can be helpful for refining system calibration parameters. According to the fact that in the implemented SfM strategy, the utilized camera IOPs are only used to restrict the search space for finding conjugate features, one can argue that the SIFT-based conjugate points are not significantly affected by those IOPs. As a result, these tie points can be used as observations in the bundle adjustment procedure for camera IOP refinement. In addition, as discussed in the bias impact analysis (Section 5), a variation in the principal distance would result in a constant shift in the vertical direction, which is identical to the impact of the Z component of the GNSS/INS trajectory, denoted as Z0. Therefore, in order to decouple the correlation between the camera principal distance and Z0 of the GNSS/INS trajectory, the latter is treated as a constant value in the bundle adjustment procedure, i.e., very small variance is assigned to the corresponding prior information. Fixing Z0 will not affect the accuracy of the estimated principal distance due to the fact that Z0 is not contaminated by systematic errors. In other words, there is no systematic bias that will be absorbed by the principal distance. Through the GNSS/INS-assisted BA using LCPs, refined IOPs are estimated and their accuracy needs to be verified. The approaches for accuracy evaluation will be the focus of the forthcoming section.

Camera IOP Accuracy Evaluation
In order to evaluate the accuracy of the refined IOPs in the previous step, the LiDAR point cloud and RTK-GNSS measurements of the checkerboard targets are used as reference. Considering that the deployed targets do not cover the entire area, a well-distributed dense point cloud is also generated and then compared to the LiDAR data. This comparison can provide a more comprehensive evaluation of the refined IOPs. The approach used in this study for such evaluation is illustrated in Figure 12. In this approach, an image-based dense point cloud is first generated using the refined IOPs for a comparison with LiDAR data. As previously mentioned, the image-based dense point cloud outperforms its sparse counterpart in terms of the number of points as well as point distribution, thus resulting in a more reliable comparison with the LiDAR point cloud. To generate the image-based dense point cloud, a bundle adjustment procedure is first conducted using the SIFT-based tie points and refined IOPs from the previous step, and then camera EOPs are derived. One should note that, although camera EOPs are not a direct product of the BA process, they can be indirectly derived from the refined GNSS/INS trajectory and mounting parameters. Next, a dense point cloud is generated using the refined camera EOPs and IOPs. Once the dense point cloud for a given dataset is generated, a point-to-point correpondence is established between the LiDAR and image-based point clouds using the same strategy introduced in Section 6.1. Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 25 The image-based dense point cloud and their corresponding LiDAR points are used to derive the discrepancy between the two point clouds. The 3D discrepancies between point pairs are incorporated into an LSA model to quantify the net discrepancy between the two point clouds. More specifically, the resultant 3D discrepancy between the coordinates of the image-based 3D point and its corresponding LiDAR point (derived using the approach described in Section 6.1) would be the normal distance ( ) of the former from the best-fitting plane through the latter. The discrepancy is oriented along the normal vector to the plane, thus mandating the incorporation of a modified weight matrix into the LSA model, which serves to nullify the components of the discrepancy in the directions along the best-fitting plane. The modified weight matrix for each point pair is derived based on the orientation of the corresponding best-fitting plane. Let the local coordinate system of the best-fitting plane be denoted as the -frame, where = denotes the unit normal vector of the plane. The strategy proposed by Renaudin et al. [43] is used to derive the modified weight matrix, wherein the components of the discrepancy in the two directions ( , ) along the plane are nullified while retaining the component normal to the plane. As proposed by Renaudin et al. [43], the weight matrix in the local coordinate system for the point along the bestfitting plane can be modified in order to nullify components along the plane by replacing the elements corresponding to the and axes by zeros, henceforth denoted as the modified weight matrix, , as given by Equation (4). The modified weight matrix in the mapping frame, , can be derived by Equation (5). The image-based dense point cloud and their corresponding LiDAR points are used to derive the discrepancy between the two point clouds. The 3D discrepancies between point pairs are incorporated into an LSA model to quantify the net discrepancy d x d y d z T between the two point clouds. More specifically, the resultant 3D discrepancy between the coordinates of the image-based 3D point and its corresponding LiDAR point (derived using the approach described in Section 6.1) would be the normal distance (nd) of the former from the best-fitting plane through the latter. The discrepancy is oriented along the normal vector to the plane, thus mandating the incorporation of a modified weight matrix into the LSA model, which serves to nullify the components of the discrepancy in the directions along the best-fitting plane. The modified weight matrix for each point pair is derived based on the orientation of the corresponding best-fitting plane. Let the local coordinate system of the best-fitting plane be denoted as the uvw-frame, where w = w x w y w z T denotes the unit normal vector of the plane. The strategy proposed by Renaudin et al. [43] is used to derive the modified weight matrix, wherein the components of the discrepancy in the two directions (u, v) along the plane are nullified while retaining the component normal to the plane. As proposed by Renaudin et al. [43], the weight matrix in the local coordinate system for the point along the best-fitting plane can be modified in order to nullify components along the plane by replacing the elements corresponding to the u and v axes by zeros, henceforth denoted as the modified weight matrix, P uvw , as given by Equation (4). The modified weight matrix in the mapping frame, P xyz , can be derived by Equation (5).
The difference in coordinates between an image-based 3D point and its corresponding point in the by Equation (6), where the error in the observations, e d x (obs) e d y (obs) e d z (obs) T , is assumed to have a stochastic distribution with a mean of 0 and variance of σ 2 0 P + . Here, σ 0 denotes an a priori variance factor of the observed discrepancies and P + denotes the Moore-Penrose pseudo-inverse of the rank deficient modified weight matrix derived for each point pair depending on the orientation of the corresponding best-fitting plane in the LiDAR point cloud.
The LSA model is then solved to derive the estimates of the net discrepancy between the two point clouds, as given by Equation (7), where i denotes the point pair index. The dispersion matrix of the derived discrepancy estimates is given by Equation (8), whereσ 2 0 denotes an a posteriori variance factor of the LSA. The dispersion matrix serves as an indicator of the accuracy of the derived net discrepancy values, where a high variance would indicate low accuracy and vice versa. The accuracy of net discrepancy estimates would depend on the distribution of the orientation of the best-fitting planes corresponding to each image-to-LiDAR point pair.
In addition to evaluating the accuracy of refined IOPs through comparison with LiDAR data, the accuracy of the image-based reconstruction is also assessed against the RTK-GNSS measurements of the twelve checkerboard targets that were set up in the field. To do so, the center points of the checkerboard targets are manually identified in the visible images. The object coordinates of the targets are then estimated through a multi-light ray intersection, using refined IOPs and EOPs from the BA procedure. Once the coordinates of targets are estimated, these coordinates are compared with the RTK-GNSS measurements, and the mean and STD of the coordinate differences are reported.

Camera IOP Consistency Analysis
In order to evaluate the robustness of the proposed strategy, the impact of the initial camera parameters on refined IOPs will be investigated. Therefore, consistency analysis approaches need to be established for comparing the refined IOPs from different initial values. This comparison is conducted by separately evaluating the similarity between two IOP sets in terms of principal distance and distortion parameters. To evaluate the impact of different principal distances, their difference is first calculated and then the impact of such difference on the Z coordinates of the object points is derived using Equation (2). The comparison between camera distortion parameters coming from two IOP sets starts with defining a virtual regular grid (the grid size is 90×90 pixels in this study) on the image plane. Then, using each set of distortion parameters, distortion-free coordinates of the grid vertices are calculated. Next, RMSE and maximum differences between the two distortion-free grid vertices are reported as a measure of the similarity of the two IOP sets.

Experimental Results and Discussion
In this section, the absolute accuracy of the LiDAR-and image-based point clouds generated using original system calibration parameters are first reported to verify the following hypotheses: (i) LiDAR system calibration is stable and derived data are accurate enough to be used as a source of control for in situ IOP refinement, and (ii) the accuracy of the image-based point cloud is negatively affected by inaccurate system calibration parameters due to the instability of the camera IOPs. The feasibility of the IOP refinement strategy is then evaluated for the seven datasets over two sites. In addition, to evaluate the robustness of the proposed strategy, this study investigates the sensitivity of the IOP refinement strategy to the initial estimates of camera IOPs.

Accuracy of LiDAR and Image-based Point Cloud
In this subsection, the accuracy of the image-and LiDAR-based point clouds derived using the original system calibration parameters is assessed through a comparison with RTK-GNSS measurements of the target centers for the seven datasets. The comparison is conducted as follows: • Image-based point cloud: Similar to the strategy introduced in Section 6.2, the center points of the twelve checkerboard targets are manually identified in the visible images. Then, using the original camera IOPs as well as refined EOPs derived from the GNSS/INS-assisted SfM procedure, 3D coordinates of the target centers are estimated through a multi-light ray intersection. Finally, the differences between the estimated and RTK-GNSS coordinates of the targets are calculated, and the statistics including the mean and STD are reported.

•
LiDAR-based point cloud: In order to evaluate the absolute accuracy of the LiDAR point cloud, centers of the highly reflective checkerboard targets are first manually identified from the LiDAR point cloud based on the intensity information, and denoted as initial points. The initial points are expected to have an accuracy of ±3 to ±5 cm due to the noise level of the LiDAR data caused by (i) the GNSS/INS trajectory errors, (ii) laser range/orientation measurements errors, and (iii) the nature of LiDAR pulse returns from highly reflective surfaces. Then, the strategy proposed in Section 6.1 is used to derive the best-fitting plane in the neighborhood of the initial points, and reliable Z coordinates are derived by projecting the initial points onto the defined planes. Afterwards, the accuracy of the LiDAR point clouds is assessed by evaluating the differences between the LiDAR-based and corresponding RTK-GNSS coordinates for the twelve target centers. Table 4 reports the mean and STD of the differences between the image-based/LiDAR-based and RTK-GNSS coordinates of the twelve target centers for the seven datasets. According to the statistics reported in Table 4, there is a misalignment between the image-based point clouds and RTK-GNSS measurements of the targets in the X, Y, and Z directions. By comparing datasets collected with the same camera settings on different dates, i.e., A-1/A-2, B-1/B-2, and C-1/C-2, one can observe that the accuracy of the reconstructed point clouds is getting worse over time. Among all these settings, IOPs under auto focus mode are found to be most unstable. Overall, the misalignments between the image-based and RTK-GNSS coordinates prove that the original set of camera IOPs coming from the in situ self-calibration procedure lead to an inaccurate object space for all seven datasets due to the instability of IOPs.
As for the accuracy assessment results of the LiDAR point clouds for the seven datasets listed in Table 4, the differences in the X and Y directions for all datasets are within 5 cm. One should note that these planimetric differences arise from the difficulty in identifying the actual center of targets in the LiDAR data. Further, according to the Z differences presented in Table 4, mean and STD values in the vertical direction are in the range of 1-3 cm. These values are within the expected LiDAR accuracy (around 3-5 cm). Overall, the small mean and STD values reported in Table 4 verify the accuracy of the LiDAR data over different dates (i.e., the LiDAR unit maintains the stability of its system calibration parameters over time). Furthermore, the agreement between LiDAR and RTK-GNSS coordinates shows that LiDAR data can be used as a reference surface for IOP refinement. Table 4. Mean and STD of the differences between image-based/LiDAR-based and real-time kinematic-global navigation satellite systems (RTK-GNSS) coordinates of the twelve checkerboard targets for the seven datasets.

Dataset
Statistics Criteria

IOP Refinement Results
The original camera IOPs estimated from the in situ self-calibration procedure as well as the refined IOPs estimated from the proposed strategy using the original set of IOPs as initial values for the seven datasets are presented in Table 5. In this table, the standard deviation of each estimated parameter and the derived square root of the a posteriori variance factor (σ 0 ) from the BA procedure are also reported. In the BA procedure, the weight matrix for the LSA model is derived by inverting the variance-covariance matrix for the SIFT-based image coordinate measurements and the GNSS/INS trajectory information. The a priori variances for the image coordinate measurements are set to (7 pixels) 2 . In terms of the GNSS/INS information, the a priori variances for the position, roll/pitch angles, and heading angle are set to be (3 cm) 2 , (0.025 • ) 2 , and (0.08 • ) 2 , respectively. In this case, the value ofσ 0 is ideally expected to be close to 1. As presented in Table 5,σ 0 values for all the datasets turn out to be significantly less than 1, which indicates that the assigned a priori variances for the image coordinate observations (which constitute a vast majority of the observations) are too conservative. For instance,σ 0 of 0.4 reveals that the image coordinate accuracy is roughly 2.8 pixels instead of 7 pixels. The estimated parameters are not highly correlated except for the radial distortion coefficients K1 and K2, which is expected due to their similar impact. Based on the low correlation as well as small STDs of the estimated IOPs in Table 5, one can conclude that the camera IOPs can be estimated accurately using the LiDAR data as a reference.

Accuracy Analysis of Refined IOPs
In this section, refined IOPs are first evaluated through a comparison between LiDAR and image-based dense point clouds. To do so, as mentioned in Section 6.2, correspondences between the two point clouds are first established, then the modified weight matrix-based LSA is used to estimate the net X, Y, and Z discrepancies (dx, dy, dz). The estimated discrepancies and correspondingσ 0 for the seven datasets are presented in Table 6. The derivedσ 0 in the LSA procedures represents the average distance between the image-LiDAR point pairs, which is around 5 cm for all datasets. The estimated X and Y discrepancies for the seven datasets are in the range of -2 to -4 cm and -3 to 6 cm, respectively, while the Z discrepancy ranges from -2 to 2 cm. A principal component analysis (PCA)-based dimensionality analysis is carried out on the dispersion matrix of the estimated net discrepancy vector. The principal components (eigenvectors) represent the directions of the discrepancies that exhibit a maximum amount of variance in the mapping frame, while the corresponding eigenvalues denote the variances along these directions. The analysis results for four sample datasets are listed in Table 7, where the derived principal components along with the corresponding percentage of total variance are presented. It is worth mentioning that similar results are observed for all datasets and therefore, only the results for the A-1, B-2, C-2, and D datasets are presented. It can be observed from the table that the third principal component for each dispersion matrix (highlighted in red) is almost parallel to the Z direction, and the corresponding variance only accounts for less than 1% of the total variance. The other two principal components with high variance percentages are aligned along the planimetric directions. Such an observation reveals that the Z discrepancy estimate is the most reliable component. One should note that the orientation distribution of the planes used for evaluating the discrepancy contributes to the accuracy of the discrepancy estimates. In this study, the captured aerial images mainly lead to the reconstruction of points along surfaces that are mostly horizontal (e.g., grassy area, pavement, or building roofs). While all such object points contribute to the estimation of the Z discrepancy between point clouds, current estimates of horizontal discrepancy are derived using points belonging to very few surfaces that exhibit mild slopes, such as gable roofs for the datasets over Site I. It is worth mentioning that the noise level in the building roof area of the generated image-based dense point cloud is higher than other parts of the point cloud due to the homogeneous texture of the gable roof. As a result, the horizontal discrepancy derived in the LSA procedure cannot reflect the actual X and Y accuracy of the generated image-based point cloud. In conclusion, the comparison against LiDAR data verifies the vertical accuracy and horizontal accuracy-to a lesser degree-of the image-based point cloud derived using the refined IOPs.    In addition to comparing the derived image-based point cloud with LiDAR data, the accuracy of the refined IOPs is evaluated through a comparison between the image-based and RTK-GNSS coordinates of the twelve checkerboard targets. Table 8 reports the differences between the two sets of target coordinates. Comparing Table 8 with Table 4, one can note that the accuracy of the image-based point cloud is significantly improved after the IOP refinement in all the X, Y, and Z directions for the seven datasets. It is also clear from Table 8 that using the refined IOPs, a horizontal accuracy of 1-2 cm can be achieved for all datasets. However, the differences in the Z direction range from 2 to 5 cm, which is an indication of a worse Z accuracy of the point cloud when compared with the planimetric accuracy. The larger differences in the Z direction are mainly due to the fact that the LiDAR point cloud which was used as a reference surface for the IOP refinement and RTK-GNSS measurements is not perfectly aligned in the Z direction, where a 2 to 3 cm shift was found in the comparison between the LiDAR data and RTK-GNSS measurements, as shown in Table 4. Moreover, it can be seen from Tables 6 and 8 that the accuracy of the reconstructed point clouds using refined IOPs for the datasets with auto focus mode (A and D datasets) is at the same level as the datasets with manual focus mode (B and C datasets). This verifies the hypothesis that the IOPs remain the same during data acquisition under auto focus mode. According to the results reported above, one can observe that by using the refined IOPs, accurate image-based 3D point clouds can be generated for all datasets over the two study sites.

Impact of the Initial Camera Parameters on Refined IOPs
As mentioned earlier, the proposed IOP refinement strategy is based on the hypothesis that the SIFT-based tie points are not significantly affected by the camera IOPs. Consequently, it is expected that if different sets of initial camera IOPs are used for conducting the refinement procedure, similar camera parameters will be achieved through the refinement strategy. Besides the IOPs from the in situ self-calibration, another set of IOPs estimated from the indoor calibration procedure is used for the IOP refinement strategy. These two sets of camera IOPs, denoted as IOP-1 and IOP-2, are listed in Table 9, and the refined IOPs estimated starting with these initial sets for the seven datasets are presented in Table 10. Table 8. Mean and STD of the differences between estimated coordinates using the refined IOPs and RTK-GNSS measurements of the twelve targets for the seven datasets.

Dataset
Statistics Criteria X dif (m) Y dif (m) Z dif (m)  Table 9. Two sets of camera parameters for evaluating the impact of initial values on refined IOPs.    To check the equivalency between the two sets of refined IOPs derived from different initial IOP sets, the consistency analysis introduced in Section 6.3 is performed to evaluate the similarity in terms of principal distance and distortion parameters. The results of this IOP equivalency analysis for the seven datasets are presented in Table 11, where the absolute differences in principal distance are within 6 pixels, leading to 3 cm variation in the Z coordinate of the object points at a 41 m flying height. In terms of the distortion parameters comparison, the RMSE values of the differences between the two sets of distortion-free grid vertices are not significant, i.e., smaller than 0.2 pixels for both x and y coordinates for all datasets, with a maximum difference less than 1 pixel. Based on the conducted camera IOP consistency analysis, the refined IOPs derived from the different initial estimates are equivalent. Thus, we confirmed the hypothesis that the established SIFT-based points through the GNSS/INS-assisted SfM do not depend on the initial IOPs. Table 11. Consistency between refined IOPs derived from different initial camera parameters: differences between principal distances along with their impact on Z coordinates as well as RMSE and maximum of differences between distortion-free grid vertices for the seven datasets.

Conclusions and Recommendations for Future Work
In this paper, a new strategy has been proposed for in situ IOP refinement. The key motivation for such development is eliminating the need for control targets, which are subject to the following drawbacks: (i) deploying these targets in the test field is an expensive, time-consuming, and labor-intensive process, and (ii) the spatial coverage and distribution of targets would be sub-optimal for IOP refinement due to the limited number of manually deployed targets. Instead of using GCPs, the proposed strategy exploited the available, highly accurate LiDAR data-collected simultaneously with imagery-to derive adequate and well-distributed control points for refining camera IOPs. Seven datasets over two study sites with different geomorphic features were used to evaluate the performance of the developed strategy. Below are the findings of the presented study: • LiDAR data are verified to be a reliable source of control for IOP refinement due to two reasons: (i) LiDAR system calibration parameters were shown to be stable over time and the derived point clouds were accurate based on a comparison with the RTK-GNSS measurements of the checkerboard targets, and (ii) LiDAR data over any type of terrain cover were proven to be sufficient for IOP refinement based on the presented bias impact analysis.

•
The proposed strategy eliminates the need for any targets or specific flight configuration as long as sufficient overlap and side-lap are ensured among the images. The IOP refinement process can be conducted more frequently and even for each individual data collection mission to ensure a good accuracy of the photogrammetric products when using off-the-shelf digital cameras.

•
The IOP refinement results from the seven datasets over the two study sites showed small standard deviation values for the estimated camera parameters as well as low correlation among those parameters (except for K1/K2). This indicates that the refined IOPs are estimated accurately. Further, image-based point clouds while using the refined IOPs showed a good agreement with both the LiDAR point cloud and RTK-GNSS measurements of the checkerboard targets, with an accuracy in the range of 3-5 cm at a 41 m flying height. This accuracy was within the expected accuracy considering the errors from the direct georeferencing as well as LiDAR data. In addition, to validate the robustness of the proposed strategy, its sensitivity to the initial camera IOPs was investigated. The results revealed that by using different sets of initial camera parameters, similar refined IOPs are derived.