1. Introduction
The bottleneck in turning every consumergrade digital camera into a 3D scanner is in how to automatically selfcalibrate the lens distortion from arbitrary image pairs. In contrast to capturing objects by encircling them with a camera, which is a wellstudied problem (see, e.g., [
1]), problems arise when capturing largescale (indoor) environments with minimum image overlap in order to reduce the effort of image acquisition. Such a method, if “blackboxed”, would enable multiple applications, for example, in realestate management and brokering (see, e.g., [
2]).
Lens distortion is a combination of lens properties and imperfections in the camera lens manufacturing process [
3]. It is well known that lens distortion changes with the focal length, which in turn is altered by both zoom and focus settings. The first is significant, while the latter can be omitted beyond focused distances of around 15times nominal focal length [
4,
5]. For zoom lenses, the form of distortion may change from barrel at the wideend to pincushion at the teleend. In other words, the sign of the radial distortion parameters may change with zoom. Computationally, this may be dealt with, e.g., by using two separate models [
6].
The image processing in photogrammetry, traditionally, consists of two steps [
7,
8,
9]. First, the lens distortion is solved in a camera calibration together with other interior orientation parameters that are the focal length and the principal point, or the center of distortion. A checkerboard [
10], other calibration targets [
11] or robust features [
8] are used to ensure accuracy. Second, (sparse) 3D reconstruction of the wanted scene is made using bundle adjustment, usually from different images than what was used in the first step. The other way, which is popular in computer vision, is to streamline these two steps into one by embedding a nonlinear lens distortion model directly into the 3D reconstruction bundle adjustment [
12,
13,
14,
15,
16]. This way, same images can be used throughout the whole process, and some of the most important intrinsic camera parameters, such as the focal length, and the first two radial distortion coefficients can be recovered [
14,
17].
Despite the success of the onestep bundle adjustment, the inherent problem in the calculation of the reprojection error is that in addition to the lens distortion, also the external camera parameters and the parameters of the structure must be estimated. In other words, if one is first interested in only solving the lens distortion, the structure consisting of 3D points brings unwanted free parameters. This is especially problematic when attempting to reconstruct a large area with an uncalibrated camera, for example, in the built environment. Then, the camera network is loosely connected, i.e., it has weak geometry. Different unknown variables intertwine in bundle adjustment so that, for example, lens distortion deforms shapes in the object space [
18]. Moreover, the effort of collecting data is unnecessarily increased by the requirement that all data must fulfill the redundancy requirement to solve lens distortion. Even if one prefers to perform a 3D reconstruction with minimal data, e.g., three images covering each point, it typically leads to having dozens or even hundreds of images. It would be beneficial if all of these available data could be used for the selfcalibration of lens distortion. Therefore, it is evident that the lens distortion should be solved separately, before entering bundle adjustment.
The largest component of lens distortion is the radial component, in contrast to decentering and inplane terms [
19]. Radial lens distortion has an ordered correlation pattern, radial symmetry or
rsymmetry, which can be detected, to some extent, with a blind approach for the purpose of removing this correlation [
20], but in order to achieve further accuracy, more a priori knowledge is required. On the one hand, the epipolar constraint has been studied [
21,
22,
23,
24,
25]. On the other hand, these studies typically rely on a single image pair, which as an approach contains the inherent need to control, not only noise, but also the scene properties in terms of appropriate feature detection, to avoid risks related to method instability. What has not been studied is utilizing multiple image pairs (that may also be from different scenes), which circumnavigates the problem by turning the question of stability, “whether an image pair is suitable for distortion detection”, into a question of convergence: “how many image pairs are needed to accurately detect the distortion”.
In this paper, we set out to exploit the rsymmetry in determining a global estimate for the radial distortion and, in order to do so, also construct a global constraint for the center of distortion that becomes more and more accurate as more image pairs are added. To our best knowledge, this has not been attempted before. We use uncalibrated images and present results for simulated data for multiple rectilinear lenses and zoom settings. In addition, real data containing false matches are used. The processing is automated to the point where our algorithm tells us whether the correction with the used model succeeds or if it is even needed.
The rest of this paper is organized as follows. In
Section 2, the literature related to the lens distortion and the epipolar constraint is reviewed. In
Section 3, as a continuation to this previous work, we show how epipolar constraints from different image pairs can be merged to estimate radial distortion. However, as this method depends on
rsymmetry, it requires a good estimate for the center of distortion. Therefore, in
Section 4, we introduce a new symmetry metric to obtain the estimate. Results are in
Section 5, and the conclusion thereafter.
2. Related Work
Correcting the lens distortion before bundle adjustment (BA) reduces the amount of unknown parameters needed during it, thus increasing the robustness of the BA solution. Simultaneously, this paradigm allows an automated blackboxtype use of the correction method; see the bottom arrow of
Figure 1. For these reasons, it is meaningful to study what parts of distortion can be solved already with the singular imagepair correlation matrix, i.e., the fundamental matrix. The first attempt by Zhang [
21] was to expand the epipolar constraint with a distortion function so that
where
${\tilde{u}}^{T}=[\tilde{x},\tilde{y},1]$ and
${\tilde{u}}^{\prime T}=[{\tilde{x}}^{\prime},{\tilde{y}}^{\prime},1]$ denote a pair of matching undistorted keypoints on two separate images, and
F is the obtained fundamental matrix. The keypoints, in turn, are obtained from observations
${u}^{T}=[x,y]$ and
${u}^{\prime T}=[{x}^{\prime},{y}^{\prime}]$ by using a (radial) distortion correction
where
${r}^{2}={(x{x}_{p})}^{2}+{(y{y}_{p})}^{2}$. This first attempt was flawed in two ways. First, rigorously, the distortion center
$p=({x}_{p},{y}_{p})$ is defined by the distortion model [
26], meaning that for each lens distortion model, there exists a unique center of distortion specific for that model. However, in the work by Zhang [
21], it was assumed to reside at the center of the image.
Second, the attempt by Zhang [
21] and that of Fitzgibbon [
22] relied on the socalled ninepoint method for solving the fundamental matrix, which does not enforce the ranktwo constraint. However, even in the studies by Kukelova and Pajdla [
23] and Liu and Fang [
24], where the ranktwo constraint was enforced, the center of distortion was still assumed to be at the center of the image, i.e., the first flaw still persisted. We shall return to these issues in the next section.
Successful attempts to determine the location of the center of distortion have been made, although these are limited to specific conditions. Hartley and Kang [
27] were able to construct a parameterfree distortion model with a center of distortion estimator by using a calibration rig, but their model, although theoretically sound, could not handle normal in situ noise without the rig. Brito et al. [
25] used the curvature of epipolar lines to determine the location of the center of distortion, but this method’s success was limited to images with curvilinear lenstype distortion, i.e., very large distortion, due to problems in detecting differences in small curvatures. In addition, oneimage pair methods typically experience vulnerabilities with respect to noise. According to Brito et al. [
25] “with more than one pixel noise the given implementation produces very large errors”.
With multiple image pairs, however, these vulnerabilities related to noise, in addition to those related to false matches and inadequately distributed features, do not turn into a question of stability, but rather, into a question of convergence. All data, even from different scenes, can be used to converge the result. Therefore, the convergence of the radial distortion coefficient as a function of the number of inlier point pairs, or image pairs—since the amount of points per image is typically fixed by the natural properties of the scene— is of major importance.
3. Iterative Solution for Radial Distortion
To order to determine the lens distortion using multiple image pairs, we may extend Equation (
1) as
where vertical bars denote the
${L}_{1}$ norm. At the limit of no noise,
$\u03f5\to 0$ with an ideal distortion model. Despite its theoretical simplicity, Equation (
4) typically contains difficult terms that follow from camera geometry, noise and false matches. We shall return to discuss these later.
Zhang’s idea of optimizing
F and
${k}_{1}$ in consecutive loops, so that one variable is estimated and the other is kept fixed, and then vice versa until both converge separately, is bright, but not numerically stable for one pair of images [
21]. However, we argue that this idea may well work for multiple image pairs, especially if the previous two flaws are overcome, i.e., the center of distortion is properly estimated (see
Figure 2), and
F’s ranktwo constraint is enforced. Considering the first, we can then exploit
rsymmetry properly. Considering the latter, the more data we have available, the more likely a larger area of the image plane is covered by the matching point pairs. Hence,
${k}_{1}$ cannot get in an artificial numerical coupling (caused by finite data) with a single
F. Furthermore, as the amount of data grows, the statistical error in
${k}_{1}$ diminishes. This is, descriptively, something solid against which
${k}_{1}$ can converge.
Therefore, we make ${k}_{1}$ and all ${F}_{k}$ to converge in consecutive loops. In the following notation, we use image coordinates that are normalized with $a=w/4$ to avoid F matrix illconditioning, where w is the image width in pixels. The symbol η is used to mark the counterpart of ${k}_{1}$, to distinguish these two from each other. To obtain dimensionless quantities for comparison, we note that the distance from the center of distortion $r=a\widehat{r}$, and $\eta =\widehat{\eta}/{a}^{2}$, where the ‘hat’ symbol stands for these dimensionless quantities.
Given a proposed center of distortion
${p}_{pr}$ and a radial distortion model
the distortion coefficient
η is computed as follows.
The fundamental matrices
${F}_{k}$ and the respective inliers are computed from an initial set of point pairs
$\left\{{\left(u,{u}^{\prime}\right)}_{i}\right\}$ using RANSAC and the proper ranktwo constraint for
${F}_{k}$ (these initial observations are replaced on later iteration rounds with the corrected points
$\left\{{\left(\tilde{u},{\tilde{u}}^{\prime}\right)}_{i}\right\}$, which is explained later). Image pairs are indexed with
k, point pairs with
i, and the inlier mask
M indicates whether a point pair is an inlier or not,
${M}_{k}\left(i\right)=1$ or
$=0$, respectively. To prevent systematic errors in
${F}_{k}$ computation, we require that at least 15 point pairs must be inliers, or otherwise, the corresponding image pair is omitted. This is similar to the amount presented in the work of Hartley and Kang [
27], who theoretically determined that 80 points in a connected network of four images is needed to estimate (radial) distortion. In
${F}_{k}$ calculation, we employed the OpenCV implementation of RANSAC [
28] with a tolerance distance of three pixels. Different confidence estimate values are tested in the Results Section with real data.
For each inlier point pair, we consider the point with the largest
r, regardless on which of the two images it lies and, with a little abuse of notation, denote it by
u. In other words, the distortion on the other image is yet neglected. For each
u, extending its (radial) vector
$\overrightarrow{u}$ from the proposed center of distortion
${p}_{pr}$ onto the epipolar line gives us the undistorted keypoint vector
where the angle
α between
$\overrightarrow{u}$ and the line perpendicular to the epipolar line is obtained as
In Equation (
6),
$\widehat{u}=\raisebox{1ex}{$\overrightarrow{u}$}\!\left/ \!\raisebox{1ex}{$\leftu\right$}\right.$ denotes a unit vector, and
$\overrightarrow{e}$ is a vector from
${p}_{pr}$ to point
e that is the closest point of
u on the epipolar line; see
Figure 3.
From Equation (
5), we have
where
$\tilde{r}\equiv \left\overrightarrow{\tilde{u}}\right$. Specifically, one trial value
${\eta}_{i}$ is obtained for each point pair
i. The radial distortion coefficient is obtained by taking a logarithmic average
where
${N}^{\ast}=\sum {\alpha}_{i}$, and the weights
${\alpha}_{i}$ are either zero or one, so that only one third of the terms with the lowest absolute value
$\left{\eta}_{i}\right$ are retained. Then, if the majority of all of the remaining
${\eta}_{i}$ were negative,
η is also declared negative.
In selecting the terms within Equation (
9), one third is a rough estimate based on the geometry that effectively circumnavigates the trouble caused by epipolar lines being close to parallel with radial lines (see
Figure 3), which produces overly large values for
${\eta}_{i}$. Moreover, this offers robustness with respect to noise and false matches present in in situ data, in contrast to the minimization criterion of Equation (
4).
After solving
η from Equation (
9), the original uncorrected points are corrected with it, namely
Here, the correction is applied to both (or all) images. However, due to the fact that the distortion on the other image was neglected before, the strength of the correction is prone to be underestimated. Because of this, when new fundamental matrices
${F}_{k}$ are obtained using corrected point pairs
$\left\{{\left(\tilde{u},{\tilde{u}}^{\prime}\right)}_{i}\right\}$ from Equation (
10), the inlier mask
${M}_{k}$ computation may result in some of the inlier point pairs still being treated as outliers. Therefore, the process of solving new
η and new
${F}_{k}$ needs to be repeated iteratively until convergence, which is reached when
η is no longer underestimated. This, in turn, is seen from the fact that the amount of inliers in
${M}_{k}$ stops growing.
We summarize the convergence condition, i.e., when the iteration loop is exited. Original uncorrected points are corrected as in Equation (
10) using the latest
η of Equation (
9) on each iteration round. New
$\left\{{F}_{k}\right\}$ and their respective new inlier masks
$\left\{{M}_{k}\right\}$ are computed based on these. Specifically,
${M}_{k}\left(i\right)=1\phantom{\rule{0.166667em}{0ex}}\mathrm{or}\phantom{\rule{0.166667em}{0ex}}=0$, if the point pair
i is correlated, or is not correlated, via
${F}_{k}$, respectively. Hence, as the value of
η becomes less and less underestimated, the inlier set of corrected points grows on each iteration round
$d\in \mathbb{N}$, notation
${M}^{d}$, as
${M}^{1}\phantom{\rule{0.166667em}{0ex}}\left\{{\left(\tilde{u},{\tilde{u}}^{\prime}\right)}_{i}\right\}\subseteq {M}^{2}\phantom{\rule{0.166667em}{0ex}}\left\{{\left(\tilde{u},{\tilde{u}}^{\prime}\right)}_{i}\right\}\subseteq ...\subseteq \left\{{\left(\tilde{u},{\tilde{u}}^{\prime}\right)}_{i}\right\}$, and when it stops growing, we have the convergence condition, namely
See
Figure 2b for an illustration of growing
${M}^{d}$. The algorithm pipeline is outlined in Algorithm 1. Next, we introduce how to obtain a good estimate for
${p}_{pr}$, which is required by Equation (
6).
Algorithm 1 EPOS algorithm. 
