3.1. Framework Overview
Our framework focuses on improving the efficiency and robustness of cameras with different focal lengths in the optimization process during SfM. As shown in
Figure 2, the initialization phase involves feature extraction and matching, relative pose estimation, camera calibration, and global pose estimation. Feature extraction, matching, and pose estimation are implemented using well-established algorithms from existing frameworks [
1], aiming at obtaining accurate matching relations, feature trajectories, initial camera parameters, and relative poses. For the problem of reconstruction failure due to mirroring, the doppelganger [
30] can be used to filter out false matches. Subsequently, the global information with more accurate global rotation, global translation, and observing points is obtained through the rotation average [
31] and global positioning [
3]. In the optimization stage, base frames need to be selected before global PA to achieve an accurate implicit representation of 3D points. We propose the depth uncertainty for base frame selection. After global PA, PORT can be used further to improve the accuracy and completeness of the reconstruction. In addition, camera clustering [
3] with larger error-point filtering can be combined to improve the robustness and accuracy of the system.
3.2. Depth Uncertainty-Based Base Frame Selection
The selection of base frames critically influences subsequent optimization. In real-world scenarios with multiple focal lengths, parallax-based selection becomes biased by ignoring focal parameters [
32]. To mitigate this, depth uncertainty evaluation incorporating camera focal length enables robust base frame selection, enhancing optimization robustness in complex situations.
Let
be the current estimation of the 3D feature point corresponding to the pixel coordinate
or
in frame
b or
f, and the vector representing the center of light of camera
b to the 3D point is
.
and
are the relative rotation and translation between frames
b and
f.
, and
and
denote the angles formed by the two rays and the baseline
, shown in
Figure 3; then,
where
denotes the vector from the optical center of frame
f to
. Let
be the camera focal length of frame
f.
, the angle spanning one pixel, can be added to
in order to compute parallax angle
; thus, by applying the law of sines, recover the norm of
:
If the focal length in the
x-axis
differs from that in the
y-axis
, a virtual focal length
can be introduced to simulate the angular change caused by moving 1 pixel along the polar line. The
is calculated from the fundamental matrix
F as
where
is the polar line in frame
f,
is the angle between
and the
x-axis, and
is the
j-th pixel point in frame
f.
,
, and
are solvable variables. Therefore, the depth uncertainty
is computed as
The aforementioned equation can be resolved in order to ascertain the depth uncertainty. The set exhibiting the minimum depth uncertainty is then selected as the base frame for subsequent optimization.
3.3. Global Pose-Only Adjustment
Consider the
j-th pixel coordinate in frame
i as
, and
is the
j-th normalized coordinate of the 3D feature point
in frame
i. We use the DPO constraints to obtain the depth
of frame
b, by which we can also get the pose-only description of the 3D feature point
:
where
denotes the normalized coordinates from frames
b or
f, and
is the antisymmetric matrix of
.
,
denotes the global rotation and translation from frames
b or
f.
denotes the pose-only representation function of 3D points
.
, and
. Then, the pixel point
can be estimated by transforming
from the normalization plane to frame
i followed by reprojection. The progress shown in
Figure 4 can be formulated as
where
,
,
, and
denotes the perspective projection functions, the distortion function, the normalization function, and the rigid transformation function.
and
, respectively, denote the distortion parameter vectors and the intrinsic parameter vectors including the focal length of the camera for the pending frame
i.
The reprojection error can be also described as
where
denotes the measured values. Then, the optimization problem for
m frames of individual images of
n points can be described as
In the above optimization problem, the optimization objectives are
,
,
, and
, which denote the rotation matrix, translation vectors, distortion parameters, and intrinsic parameters, respectively, excluding the 3D feature points. After obtaining the reprojection error, nonlinear optimization can be used to obtain the required parameters. In this paper, the Levenberg–Marquardt method is used to solve to obtain the update direction. Assuming
is the Jacobian matrix of the frame
i, where
, and
denote the Jacobian matrix of the rotation, the translation, the distortion parameters, and the intrinsic parameters, respectively; then, each part can be written in detail as follows:
where
denotes the right rotation perturbation.
is the
j-th pixel coordinate in the frame
i.
is the
j-th distortion coordinate in the frame
i.
is the
j-th normalized image coordinate in the frame
i, and
is the
j-th camera coordinate in the frame
i. Next, a detailed derivation of the intrinsic Jacobian matrix for frame
i is presented as an example. The derivation of the Jacobian matrix with respect to the extrinsics can be found in [
12].
(1) Partial derivative of the normalized coordinates from the distortion coordinates: In reality, given the inherent imperfections of the camera component assembly and the possibility of lens positional fluctuations due to movement, it becomes imperative to incorporate radial and tangential distortions to accurately depict the aforementioned scenario. This paper aims to facilitate the calculation of the distortion transformation by providing a detailed description function
, and
is the
j-th distortion coordinate in the frame
i which can be expressed in the following manner:
with
where
and
.
denotes the error generated by radial distortion,
denotes the error generated by tangential distortion, and
r is the distance from the normalized plane point to the center.
denotes the vector of distortion parameters;
, and
is the radial distortion coefficient; and
and
denotes the tangential distortion coefficient. Therefore, the partial derivative of the distortion point to the normalized coordinates is
in which
where
is shorthand for the normalized coordinates
. In this paper, only three parameters are used to represent the radial distortion parameters. This is because higher-order parameters often do not bring significant performance improvement in practical applications and increase the computational complexity, so three parameters are sufficient to describe the majority of cases.
Similarly, according to (
17), the partial derivative of the distortion coordinates to the distortion parameters vectors is
(2) Partial derivative of the distortion coordinates from pixel coordinates: Assume that the camera model is a pinhole camera model, and its perspective projection and affine transformation is expressed as a function
. For a square detector unit, the skew coefficient can be defaulted to 0. Then, the intrinsic parameters of the camera include the focal length of the
x-axis
, the focal length of the
y-axis
, and the principal point
. Let the intrinsic vector
; then, the pixel point
can be described by the following equations:
where
is
, and the partial derivative of the
j-th pixel coordinates of frame
i to the distortion coordinates can be described as
According to (
20), the partial derivative of the pixel coordinates to the intrinsic reference vector can be obtained:
Thus, we can solve the Jacobian matrix of frame i and use the above method to solve the Jacobian matrix of frames b and f. Note that the optimization of the intrinsic parameters of frames b and f is slightly different from that of frame i, as it is hidden in the process of normalizing the points and to pixel coordinates.