1. Introduction
Camera calibration is an important step in both photogrammetry and computer vision in order to extract metric information from two-dimensional (2D) images. Calibration includes interior orientation parameters (IOPs), e.g., focal length, principal point, lens distortion, skew, and aspect ratio, as well as exterior orientation parameters (EOPs), e.g., camera orientation and position. Various camera calibration methods have been developed using three-dimensional (3D) reference objects, 2D planes, or even lines [
1]. Traditional camera calibration can be achieved using Tsai’s camera calibration model [
2] or planar patterns [
3]. This strategy only requires a known, planar calibration grid to estimate the IOPs and EOPs of the camera. Sturm and Maybank [
4] summarized the singularities of calibration from one viewpoint with one or two planes. However, the calibration grid may not be easy to find and place properly, especially for in situ measurements.
Using vanishing points is an efficient process to obtain the camera pose directly in the scene by extracting parallel and perpendicular lines [
5]. The geometric property of vanishing points has been well-defined in much of the photogrammetry literature. The principal point of the camera coincides with the orthocenter of the triangle, whose vertices are the three vanishing points for three orthogonal directions [
6]. These lines commonly appear in man-made structures, for instance, rectangular windows, floor lines, columns, and beams, and are useful for detecting vanishing points. However, how to calculate precise and accurate vanishing points is a great challenge because any deviation will cause error propagation to camera calibration and subsequent object reconstruction processes [
7].
This study developed an effective and flexible camera calibration method, which is particularly useful for on-site calibration, based on vanishing point extraction and refinement. The geometric relations between vanishing points and the camera system are defined according to collinearity condition equations. The developed algorithms require no prior camera parameters, nor internal or external parameters for onsite calibration. The proposed vanishing point refinement algorithm can reduce the uncertainty of vanishing point localization errors.
  2. Vanishing Point Estimation
Projecting detected line segments in the image plane onto the Gaussian sphere is one of the classic approaches [
8,
9] for detecting vanishing points. Each line can be represented as a circle using angular parameterization (azimuth and elevation) of the Gaussian sphere. Vanishing points appear as the intersections of these circles, which represent the high occurrence rate of a particular element. Thales’ theorem can optimize the position of reference points [
10] to overcome the projection center location problem in the Gaussian sphere. The Thales’ circle ensures that any line segment passing through the vanishing point must be perpendicular to the principal point in an isosceles triangle. The optimal triangle area minimization can then be achieved using least-squares techniques, and the accuracy and automation can be improved using random sample consensus (RANSAC) [
11,
12,
13]. The Hough transform is a well-known method for detecting parametrical structures in images [
14]. A double-cascaded Hough transform approach was introduced [
15] to overcome the intrinsic limitations that prevent the extraction of the line segments along the main directions. In order to reduce the error and identify the possible points of intersection, a voting scheme was proposed based on a set of rules that weights each pair of intersected line segments in relation to their geometric characteristics [
16].
An iterated Hough transform method was proposed to help find vanishing points and lines [
17,
18]. The method investigated a bounded slope–intercept parametric representation by splitting the original unbounded space into three bounded subspaces in order to keep the symmetry intact. It also employed a filtering algorithm before applying the second Hough transform to help extract important information emerging in each Hough space. Also based on the cascade Hough transform, a filtering and validation algorithm was implemented to cluster the line segments and estimate the vanishing points simultaneously [
19,
20].
Instead of using double transformation, another approach was to work directly on the first Hough polar plane [
21]. By searching for a sinusoidal curve with appropriate amplitude and phase parameters, the least-squares minimization was applied with a weighting ratio. The ratio considered the number of times that the parameter set was observed as a mapping point of a line in the image. Similarly, Cantoni et al. [
22] applied a filtering algorithm directly on the image plane after the first Hough transformation. This threshold-based filter works efficiently on edge-detected images, but the camera must be perpendicular to the reference plane and the horizontal line (vanishing line) should be parallel to the X axis. Some researchers combined fuzzy clustering algorithms to separate an image into several regions [
23]. For each region, vanishing lines and the vanishing point can be located using the Hough-based method individually. This can help extract local vanishing points from specific objects.
Besides using transformation-based parameter estimation, the grouping together of features that satisfy a geometric relationship can also be used to detect and estimate vanishing points and lines. For example, McLean and Kotturi [
24] integrated image processing and analysis algorithms to produce a method for practical feature extraction. In their method, the use of histogram analysis, clustering, and numerical optimization to locate vanishing points eliminates the need for any a priori estimates of the number or location of vanishing points. In addition, including a line quality measure allows large line data sets to be used without decreasing the overall quality of the vanishing point estimates, further increasing the degree of automation.
There are three common types of geometric grouping [
25], which are: (1) a family of equally spaced coplanar parallel lines; (2) a planar pattern obtained by repeating some elements translating in the plane; and (3) a set of elements arranged in a regular planar grid. The presence or absence of geometric constraints is strong evidence for or against hypotheses such as parallelism in the real world. Almansa et al. [
26] developed a detection algorithm that deduced the Gaussian sphere from the Helmoltz principle proposed by [
27]. They divided the image plane into radial vanishing regions, and used minimum description length to restrict the number of false vanishing points. However, this approach works only when the vanishing point is not located within the image boundary. The direct measurement of the raw image can be simplified [
28] using a RANSAC line model and expectation maximization (EM) [
29], and the J-linkage clustering algorithm [
30]. Nonetheless, lens distortion and strong image noise still degrade the performance of the line extraction and grouping process.
For real-time vanishing point detection, the local dominant orientation signature (LDOS) descriptor was introduced [
31] to extract structural features directly from the image domain. The descriptor divides an image into several square blocks and accumulates the edge magnitude for each of them. The candidate vanishing blocks can be estimated by comparing the spatial distances from neighboring blocks containing the perspective lines with a similar direction (orientation).
  3. Proposed Method for Vanishing Point Estimation
The proposed vanishing point estimation method consists of three parts. They are: image pre-processing, feature detection, and vanishing point localization.
  3.1. Image Pre-Processing
The objective of pre-processing is to extract enough line segments for initial vanishing point detection. It is also useful for line-based radial distortion correction [
32]. Firstly, straight line segments with sub-pixel accuracy are extracted using the Canny edge detector [
33] with an additional linking and merging process. Merging aligned edges by orthogonal regression can increase the accuracy of their location and orientation.
An improved cascade Hough transformation approach is proposed to extract line segments from the edge pixels and to classify them to the probable vanishing point candidates. The two steps of Hough transform are illustrated in 
Figure 1. The first Hough transform extracts line segments from the edge pixels. The initial vanishing point localization is processed using the output from the first Hough transform to group the line segments passing through the same region on the image. The details of the two Hough transforms are described in 
Section 3.2 and 
Section 3.3, respectively.
  3.2. Feature Line Detection
The first Hough transform is commonly used to detect line segments in the image by keeping the dominant peaks in the normal-distance and normal-angle (
-
) space. The parameterization of the Hough transformation is based on the orthogonal distance 
 of the line to the origin and the direction 
 of the normal to the line. Each pixel 
 forms a sinusoidal curve on the 
-
 space:
Thus, a set of points that form a straight line will produce sinusoids which cross at the specific - for that line. Therefore, finding collinear points on the image can be converted to the problem of finding accumulated peak in the - space.
Short lines or falsely detected edges will significantly decrease the accuracy of line clustering and vanishing point calculation. That makes feature line detection and filtering indispensable. A voting scheme is used to select candidate peaks from the accumulated histogram for collinearity detection in 
-
 space. For the best results, this study uses the inverted pyramid pattern iterative calculation for 
-
 parameters, and the iteration stops when detected vanishing points are stable, as explained in 
Section 3.3.
  3.3. Initial Vanishing Point Localization
According to the detected peaks in the first Hough transform, a second transformation for those peaks is employed to identify line segments passing through the same point (or small area) on the image. The local maximum peaks in the first Hough space appear to be collinear because that specific pixel (possible vanishing point) in the image contributes to all of the - parameters’ accumulators.
Two conditions for obtaining stable vanishing points are considered. The first is the number of line segments in each direction, i.e., adjusting line group number threshold to prevent most detected lines from pointing to a certain direction. The other is that the representative vanishing points should be stable under different - parameters. Modifying - parameters can increase the reliability for vanishing point calculation.
Afterward, similarity rectification is applied to identify different line segments. Then, a least-squares method is employed to trace line groups interactively by adjusting the threshold of histogram peaks until the number of line groups is satisfied. Finally, vanishing points are calculated according to the grouped line segments and optimized with iterative calculation.
Figure 2a is an example of an input image. Two groups of dots are marked as rectangles and triangles, respectively. Each set of four points forms a line and the lines pass through two intersect points marked as dots. 
Figure 2b illustrates the result after the first Hough transformation, in which candidate peaks are marked as squares and triangles forming a line. An example of a voting scheme is shown in 
Figure 2c, and the number of each accumulator represents how many lines are passing through it. High peaks extracted from 
Figure 2b are transformed as lines in the second Hough transform as demonstrated in 
Figure 2d. The intersection of the lines are marked as points representing their groups in rectangles and triangles, respectively.
 If necessary, a third Hough transform can be applied to the peaks of the second one to detect collinear vanishing points. These kinds of features can be used to construct vanishing lines.
  4. Camera Calibration Using Vanishing Points
Vanishing point-based calibration is considered as one of the most practical calibration methods. The collinearity condition equations utilize the geometric position of a perspective center, an image point, and its corresponding object point as follows,
      
      where 
 and 
 are the image coordinates of a point; 
 and 
 are the object space coordinates; 
, 
, and 
 are the coordinates of the perspective center; 
f is the principal distance; 
 and 
 are the coordinates of principle point; and 
 are the elements of the 3 × 3 rotation matrix 
M, consisting of three rotation angles: 
 (pan), 
 (tilt), and 
 (swing) [
34]:
The camera and object coordinate systems are illustrated with geometric relations between vanishing points, the image plane, and the center of the camera in 
Figure 3.
Under the assumption of well-calibrated lens distortion of the perspective system, 
, and 
, and the three rotation angles 
, and 
 can be estimated using collinearity condition equations and three mutually orthogonal vanishing points. 
, and 
 are the three vanishing points intersected from the parallel lines along the “X, Y, Z” axes in the object space, respectively [
35]. Therefore, one can assume the vanishing points are at the infinity place in the object space. For example, at 
, the vanishing point 
 following Equation (
2) can be rewritten as:
Similarly, assuming 
 and 
, 
 and 
 can be derived as
      
From Equation (
4) to Equation (
7), the three vanishing points are only related to 
, and 
 and the three rotation angles 
, and 
. These six unknowns can be solved using 
, 
 and 
.
  4.1. Camera Orientation Calibration
A pair of vanishing points can be used to define a vector (vanishing line), and thus, three vectors can be found from the combination of three vanishing points:
The slope (
m) of these three vanishing lines can be calculated as
        
Hence, 
 can be determined from 
 as shown in Equation (
9).
Angle 
 and 
 are therefore estimated from the multiplication and division of Equations (
10) and (
11), rewritten as Equations (
12) and (
13), respectively.
        
  4.2. Camera IOP Calibration
Each vanishing point and the principle point can also be used to define a vector, therefore, three vectors can be found from three vanishing points:
The orthocenter of the triangle formed from the three vanishing points of the three mutually orthogonal directions identifies the principal point through the inner product of the segments of the triangle and its heights. For instance, the inner product of  and  is equal to zero due to the perpendicularity, and is same as the inner product of  and . Hence, the principal point can be solved by expanding these two simultaneous equations.
The focal length, 
f, can be computed afterwards as the square root of the product of the distances from the principal point to any of the triangle’s vertices and the opposite side:
According to the derivation above, the standard procedure of the three vanishing point-based camera calibration starts from the rotation angle estimation (Equations (
9)–(
11)), followed by principle point calculation, and finally, the focal length from Equation (
16). Six unknowns thus can be solved with a unique solution.
If and only if the image is un-cropped and captured by a pinhole camera, the lens distortion calibration can also be achieved using vanishing points. The most commonly encountered lens distortion is radial distortion, including barrel and pincushion distortion. The standard model is formulated as:
        where 
 and 
 are the corresponding image coordinates with distortion; 
, and 
 are the coefficients of radial distortion, and 
 is the distorted radius.
To find the distortion parameters 
 and 
, this study follows the fundamental property of the perspective camera model. Vanishing points provide an useful constraint to estimate the radial distortion parameters using the line-fitting adjustment of an image point observed from the corresponding vanishing point. The observed image lines are constrained to converge to their corresponding vanishing point 
 according to the following equation:
Include the symmetric radial distortion parameters 
 and 
 into Equation (
18), and it becomes:
When  and  are obtained, the line best-fit parameters of  and  can be estimated using a least median of squares (LMedS) procedure.
  5. Vanishing Point Refinement
The vanishing points are imaginary points an infinite distance away from the projection center. Therefore, no direct measurement can be achieved to locate the exact locations of the vanishing points. It is difficult to extract vanishing points without random or systematic errors, especially in the cases of images with weak perspective geometry (e.g., long focal length). Consequently, increasing the reliability of the vanishing points’ positions is an important task for vanishing point-based camera calibration. The proposed vanishing point refinement process described here minimizes both random and systematic errors based on the constraints derived from common geometric properties of man-made structures. For instance, feature points pertaining to the same (flat) roof, building base, or floor etc. should have the same height or planar coordinates, however, biases may occur because of computational errors. The systematic error from the perspective of projection consistency thus provides an indication for fine-tuning the best positions of the vanishing points.
  5.1. Feature Point Selection and Base Point Estimation
To estimate the perspective projection consistency using vanishing points, it is necessary to find sufficient feature points and their corresponding base points on the reference plane. The most common features of artificial structures are corner points at the intersected edges, planes, or boundaries. Thus, detection of feature points from the extracted long edges is more reliable than from the raw image. Short segments and small closed polygons from detected segments can be ignored because most of them are windows, patterns, or minor structures. The candidate feature points are then detected using Harris corner detector [
36]. Some of the geometry constraints can be used for filtering feature point candidates. The first task is to define a reference plane with a reference origin and vanishing points along the X and Y axes, where origin (
O) is normally formulated as,
        
The reference origin is defined as the intersection point from the bottom edges along the X and Y axes of the main structure. Candidate feature points below the reference plane or collinear to others can also be removed. However, the proposed procedure requires user interaction to make the final selection. The next task is to estimate base points. A base point is the vertical projection of a feature point onto the reference plane. It is a necessary element to estimate the consistency of perspective projection constructed from vanishing points. However, most base points are hidden in the image because of self-occlusion; only a few of them may have the potential to be extracted directly from the raw image. For estimating the corresponding base points, the proposed process is based on the characteristics of vanishing point constraints, and is an automatic and robust solution. 
Figure 4 illustrates a procedure for predicting base points. The extracted feature points are marked in round blue dots in 
Figure 4a. Following the assumption of the collinearity of the feature point (a) and the base point (b), the search area can be one-dimensional along the 
. The task now is simplified into finding the horizontal location of the base point on 
.
The estimation process can be generalized into three steps. First, all feature points are projected onto the Y–Z plane according to 
 and 
. The projected feature points are marked in red triangles as illustrated in 
Figure 4b. Points with the same height level and the same Y coordinate should overlap at the same projection point. Similarly, feature points can also be projected onto the X-Z plane using 
 and 
. Secondly, the red triangles can be further projected onto the Y or X axis, as noted in green squares in 
Figure 4c along 
 to locate the Y coordinate of each feature point. Finally, candidate positions of the base points located on the line are linked from green squares to 
 (or 
) as shown in 
Figure 4d defining the horizontal locations of the base points. The intersection points to the lines in 
Figure 4d and 
 are the estimated locations of the base points (red circles in 
Figure 4e) that are corresponding to feature points according to the path record. In 
Figure 4f, the green lines represent the target heights between the feature points and their corresponding base points which will be determined in the following procedure.
  5.2. Vanishing Point Fine-Tuning
The error of each set of grouped projection point during the base points estimation process can be minimized by fine-tuning the positions of vanishing points. The vanishing point localization errors will cause the displacement during the projection process; therefore, the calibration results normally include systematic errors. Feature points with the same height and Y coordinate should be perfectly projected onto the same projection point as shown in 
Figure 5a, and points with the same Y coordinates should also overlap with the same position marked in 
Figure 5b.
The divergences in the first and second projection steps provide useful information for vanishing point refinement. The more precisely the vanishing point positions are estimated, the fewer the divergences that may occur during the projection process.
To decide the fine-tuning values and orders, a moving pixel pyramid and half-and-half adjustment strategy was developed in the proposed algorithm. The objective is to minimize the standard deviation of each of the clustered projection points,
        
        where 
 are the fine-tuning pixels in the image space for each vanishing point; and 
 is the projected feature point in each step that belongs to group 
k with a mean value of 
. Every fine-tuned pixel will update the standard deviation for each group. However, the traditional moving pixel approach takes 
 interations for each vanishing point, where 
N is the number of moving pixels along horizontal and vertical axes. The proposed moving pixel pyramid is a coarse-to-fine approach, fine-tuning the vanishing points from large pixel spans to the sub-pixel level with 
 computational complexity. The fine-tuning begins with larger pixel spans to locate a coarse area with the lowest standard deviation value. Then, the span pixel value is reduced to zoom in to a smaller area, until the iteration ends with the sub-pixel leveled fine-tuning. 
Figure 6 demonstrates an example of the pyramid fine-tuning approach, in which the initial vanishing point is in the center, and the searching boundary is from 
 to 
 pixels on both directions from top-left [−50,−50] to bottom-right [50,50]. The first fine-tuning span is of 20 pixels. After estimation and update of the vanishing point position 25 times, the process zooms in to the next level with a span value of four pixels. The same procedure continues and the vanishing point can be fine-tuned to an ideal position after 
 iterations instead of 
 iterations using the traditional pixel-by-pixel searching approach.
The fine-tuning process is based on statistic estimations, and it is difficult to determine which vanishing point localization displacement affects the overall errors most. In case of weak perspective geometry, one of the vanishing points may contain larger error than the others, but the fine-tuning process may reduce the error caused by the vanishing point which should not be adjusted. The proposed half-and-half adjustment strategy can reduce the error caused by a large vanishing point displacement. This strategy first adjusts the half-distance of each vanishing point from the original position to the optimized position. Which adjustment provides the greatest contribution should be determined and then that specific vanishing point should be fully adjusted towards the optimized position.
Geometrically, modifying 
 and 
 along the vertical direction in image space changes the vanishing line slope. Varying the slope of a vanishing line means the adjustment of the reference plane for optimizing all feature points perpendicular to it. An incorrect slope of the vanishing line estimation will cause a tapering effect shrinking to one side of the vanishing point and enlarging on the other side to the line segments that should have the same length. The horizontal change of 
 and 
 will resize the area formed by three vanishing points, which refers to the focal length calibration as mentioned in Equation (
16).
  6. Results and Discussion
A computer-simulated model test case (
Figure 7a) was used to demonstrate the developed vanishing point estimation and refinement process step by step. The simulation image was generated using Trimble Sketch Up software. After creating the 3D model, it was output to an image with the perspective matrix projection camera module. Strong edge pixels were extracted from the raw image and converted into first Hough space for line detection. The high peaks in the first Hough transform were extracted with local maximum suppression, removing duplicated candidates due to over-segmentation. The extracted peaks represent the line equations in the image space. To clarify which lines belong to which vanishing point, the high peaks are further transformed in the second Hough space. The transformed peaks intersect at the same bin if they belong to the same vanishing point in the image space. The classified result (
Figure 7c) consists of three groups of detected line equations marked with different colors.
The initial vanishing points are then used with selected feature points for searching for corresponding base points. A step-by-step base point estimation process is illustrated in 
Figure 8. All selected blue feature points are first projected onto the 
 plane from 
, marked as red triangles (
Figure 8a). Those red triangles are further projected to the direction of 
 onto the 
Y axis, marked as green triangles (
Figure 8b). Finally, the intersection of 
 and lines linked from green triangles to 
 indicate the base points (
Figure 8c).
Figure 9 displays three enlarged parts of the projection process from 
Figure 8d. Several projection lines passing through 
 are not well-overlapped. There appears to be a systematic misalignment due to vanishing point displacements.
 Figure 10 demonstrates the estimated base points located on the referenced plane. Several points should have been identically overlapped on the same coordinates. However, the error during the initial vanishing point estimation caused the projecting errors in each projection step.
 The proposed fine-tuning algorithm was applied to reduce the divergences. 
Figure 11 compares the conventional moving pixel-based (
Figure 11a) and the proposed coarse-to-fine approaches (
Figure 11b,c). This test case used a 10 × 10 coarse-to-fine fine-tuning moving pixel pyramid. The sampling spans were of 25, 10, 3, and 0.5 pixels, respectively for each level.
The fine-tuning is based on the statistic value from local (red triangles) or global (green triangles) divergences. Reducing the standard deviation for red triangle groups will decrease the local error of each point group. Minimizing the standard deviation for red triangles, however, may increase the global error. For instance, two red triangle groups at different heights should be in the same green triangle group. Fine-tuning the vanishing points may lead to divergences when projecting these two red triangle groups on the Y axis.
To validate the robustness of the proposed refinement process, IOPs and EOPs were calculated with several additional offset errors manually put on to the initial vanishing point 
 (
Table 1). The listed results show that the EOP differences are of less than 
 and the IOP differences are less than 3 pixels—a significant decrease in errors and a consistent improvement of the 3D point measurement accuracy. 
Figure 12, 
Figure 13 and 
Figure 14 display the robustness of the refinement process, demonstrating that the proposed refinement is capable of reducing the uncertainty of the initial vanishing point estimation.
Figure 15 was extracted from a video sequence with a dimension of 704 × 480 pixels in JPEG format. The estimated radial distortion coefficients 
 and 
 are 
 and 
, respectively, calibrated using straight line segments. 
Figure 15b shows the extracted lines with feature points (
a to 
g) of the targeted building. This case assumed the back side of the buildings have the same 
 coordinates as the targeted structure on the left (Feature 
a, 
b and 
c).
 Classified line segments are then used to estimate the initial location of the vanishing points. Because the tilt angel is low in this case, the vanishing point in the vertical (
Z) direction is far away from the center (
Figure 16). 
Table 2 lists the calibrated camera parameters with and without the proposed vanishing point refinement process using both a raw image and a lens distortion-calibrated image.
The reconstructed model was compared with field-surveyed data for quantitative analysis of the accuracy (
Table 3), which also evaluated accuracy improvement of the building height estimation after the refinement of vanishing points. Using the measured distance (17.4 m) between feature point 
d and its corresponding base point as the reference, the maximum error of feature point 
f is about 3%. After the vanishing point refinement, not only were feature points of the same level correctly assigned with identical heights, but the overall RMSE decreased to less than 0.7%. Validations of 3D point measurements (
Table 4) were compared with field-surveyed data from tape and laser measurements.
  7. Conclusions
This paper presented a novel camera calibration approach based on vanishing point geometry. The proposed algorithms can be used to obtain reliable camera parameters without prior information and are particularly useful for onsite camera calibration. The proposed algorithms can also deal with the uncertainty of vanishing point calculation, that may significantly affect the camera IOPs/EOPs estimation. The main contribution of this study is the proposed vanishing point refinement strategy, which can significantly reduce the systematic and random errors stemming from the vanishing point localization. The fine-tuning process can minimize the projection error of each feature point after a few iterations using the half–half adjustment. A coarse-to-fine fine-tuning approach is also proposed to improve the processing efficiency from  to . To extract and group line segments for initial vanishing point estimation, this study improved cascade Hough transformation with adaptive thresholds. Extracted line segments are more robustly classified to the corresponding vanishing point.
Experiment results shown in this paper also demonstrate the robustness of the proposed refinement approach under high initial vanishing point estimation errors, improving 3D point reconstruction accuracy by 30% and keeping the estimated camera parameters consistent under additional vanishing point localization errors. A video frame case evaluated the improvement of the proposed vanishing point refinement process. The height measurement error was reduced from 2.04% to 0.64%. The proposed algorithms can be implemented on in situ camera calibration, single view metrology, and simultaneous localization and mapping (SLAM) applications in the human-made environment. Future improvements will focus on the integration into the SLAM system as a real-time camera pose tracking attribute. The developed calibration strategy also has a great potential for implementation with panoramic and omnidirectional cameras.