An Incremental Target-Adapted Strategy for Active Geometric Calibration of Projector-Camera Systems

The calibration of a projector-camera system is an essential step toward accurate 3-D measurement and environment-aware data projection applications, such as augmented reality. In this paper we present a two-stage easy-to-deploy strategy for robust calibration of both intrinsic and extrinsic parameters of a projector. Two key components of the system are the automatic generation of projected light patterns and the incremental calibration process. Based on the incremental strategy, the calibration process first establishes a set of initial parameters, and then it upgrades these parameters incrementally using the projection and captured images of dynamically-generated calibration patterns. The scene-driven light patterns allow the system to adapt itself to the pose of the calibration target, such that the difficulty in feature detection is greatly lowered. The strategy forms a closed-loop system that performs self-correction as more and more observations become available. Compared to the conventional method, which requires a time-consuming process for the acquisition of dense pixel correspondences, the proposed method deploys a homography-based coordinate computation, allowing the calibration time to be dramatically reduced. The experimental results indicate that an improvement of 70% in reprojection errors is achievable and 95% of the calibration time can be saved.


Introduction
One of the most fundamental problems in the field of computer vision is how to estimate geometric parameters of an image sensor. It forms an active vision system where the image sensor is coupled with a light projector. The performance of such an active vision-based measuring instrument heavily relies on an accurate calibration procedure to determine the geometric parameters of the paired image sensor and light projector. The Microsoft Kinect™ is perhaps one of the most well-known examples [1]. Asides from Kinect's popularity, today's off-the-shelf video projectors are widely adopted to build 3-D scanners due to their cost efficiency and availability [2][3][4]. Knowing the geometric parameters of a projector also makes it applicable to a wider range of applications, such as augmented reality and performing arts (e.g., [5,6]). The interest in calibrating video projectors has therefore been significantly increasing in the last decade (see [4,5,[7][8][9][10][11][12] for example).
A projector can be effectively described by the pinhole camera model. It is well-known that the geometric parameters of a pinhole camera can be estimated from the world-image correspondences of a set of control points [13,14]. Therefore it is possible to simultaneously calibrate both the camera and the projector using the same object. However, calibrating a projector is not as trivial as calibrating a camera since there is no straightforward way to observe what a projector "sees", making the establishment of the projector-world correspondences a challenging task.
One approach is to reconstruct the view of the projector from actively acquired camera-projector correspondences (see Figure 1 for example). In order to sample as many control points as possible in the reconstructed view, the process requires establishing dense point-wise mapping from the projection screen to the image plane in sub-pixel precision. It usually involves the projection of a sequence of temporally-codified light patterns, which is not only a time-consuming procedure, but also poses problem when classifying pixels on the stripe boundaries [15]. As a result, dense correspondences come at the cost of either dropped accuracy or increased scanning time, which are not desirable in the calibration process.
In past literatures, a typical alternative strategy is to project some easy-to-identify features onto a calibration target (a plane in most cases), which is associated with the so-called world (or global, or object) coordinate system. By analyzing images of the calibration target, a set of projector-world

Related Work
The calibration of video projectors has recently received a lot of attention in the field of computer vision. Related work in the literature can be categorized into either photometric or geometric calibration. In this paper we focus on geometric calibration. Since a projector can be described as an inverse camera, many works use the same calibration object to estimate the geometric parameters of the projector while calibrating the camera (e.g., [4,7]). These methods require the acquisition of dense stereo correspondences so that a mapping of control points from projector screen to the world coordinates can be obtained. To achieve better accuracy, the calibration object is replaced many times and the scanning procedure is performed repeatedly. As a result, the calibration time is greatly increased. Experimental results of [4] and [7] show reprojection errors of 0.224 and 0.113 pixels, respectively.
In [8] line patterns are used to find sparse projector-world point correspondences without the projection of sequences of encoded light patterns, achieving a reprojection error of 0.428 pixels. In their work, the projector is assumed to follow the linear projection model. In practice, some projectors may cause non-negligible radial lens distortions, as discovered in our experiments. In this case, the estimated parameters may be far from accurate. Some methods (e.g., [10,11]) suggest to use another "projector-friendly" object (e.g., a white board) from which the projected calibration patterns are easy to locate. An obvious drawback is that it requires two different targets to calibrate a camera-projector system.
There are also methods utilizing special devices to overcome the interference of calibration pattern. For instance, Zhan et al. use a LCD monitor as the calibration target [12]. The panel is turned on with a checkerboard pattern displayed to calibrate a camera and turned off during the projection of light patterns. They have achieved an accuracy of around 0.4 pixels in reprojection error.

Nonlinear Projection Model and Geometric Calibration
Adopting a model that accurately describes the geometric imaging or projection behavior of a device is critical to the performance of calibration. It has been reported that, like image sensors, an off-the-shelf video projector may pose significant lens distortion due to nonlinear factors which cannot be compensated by the classical pinhole camera model [16]. Therefore, we adopt a modified pinhole camera model with nonlinear correction of radial and tangential lens distortion [13]. Adopting the nonlinear model, a 3-D point , , expressed in the world coordinate system is first projected onto a point , in the normalized ideal image plane using: where ~ means equality up to scale, and: contains the extrinsic parameters that transform points to the camera-centered coordinate system by the rotation matrix 3 and the translation 3-vector . The following nonlinear model is then applied to approximate the distorted pixel , in normalized coordinates: with the radial term and the distortion coefficients , , , , . The normalized coordinates , can be converted to pixel subscripts , as: where and are the effective focal lengths in horizontal and vertical direction respectively, and , is the point where the optical axis passes through the image plane. All these parameters plus the distortion coefficients are the intrinsic parameters of a non-linear pinhole camera.
The projection can be denoted by a nonlinear 2-vector function Φ , , , parameterized over the intrinsic and extrinsic components. Given a set of world-image point correspondences , , , captured from multiple views, one may recover the parameters of Φ. In this work we apply Zhang's calibration method [13] to solve linear parameters , , , , first, and then fit the result into the nonlinear model with the distortion coefficients , , , , taken into account by minimizing a least-square function in terms of reprojection error. As has been suggested in [14], the reprojection error in horizontal and vertical directions should be dealt with separately, we define the error functions as: , , , , , , and apply an implementation of the Levenberg-Marquardt algorithm to search for the best fitting parameters.

Incremental Calibration Framework
In this section we present a framework that begins with a few projector-world correspondences and continuously upgrades the estimated parameters of both image sensor and video projector. The proposed calibration procedure works as follows: 1. Several sets of initial world-camera and world-projector correspondences are first collected. This is typically a rapid process using one-shot pattern projection. 2. Initial camera and projector parameters are calculated.
3. An image of the calibration target is captured to calculate its pose. 4. Positions of good feature points that are ideal for projector calibration are calculated using initial parameters and the estimated pose. 5. Pattern renderer generates a calibration pattern according to the calculated positions. 6. Feature points are projected, tracked, and matched to their ideal positions. 7. According to the observed deviation, the projector-world correspondences are updated and the parameters are re-calculated. 8. The process repeats through Steps 3 to 7 until the parameters converge. As presen For example corresponde e.g., circles

Framew
The , the control points in world coordinates are projected onto projector's screen using estimated extrinsic parameters and previously calculated intrinsic parameters, resulting in the locations of feature points on the projection screen. According to , a calibration pattern is rendered and projected onto the scene. Due to the error in the calibrated parameters, , the actual locations of projected features will differ from , the estimated locations on the calibration board. We will use image feedback to correct this. To find more accurate correspondences , the feature points are extracted from captured images for further analysis. These points have to be associated with to form calibration data for the projector. The matching can be performed quite efficiently if some hints are available. Hence we use the projection of onto the image plane, denoted by , as the starting point of search. Details of the generation and analysis of calibration pattern will be further studied in Section 5.
In the final step, the matched image points are transformed to world coordinates via (see Section 4.3 for the computation and use of homography). Once the world-projector correspondences are ready, the system performs a multiple-view calibration algorithm which also takes calibration data collected in previous 1 viewpoints to compute refined intrinsic parameters and new extrinsic parameters .

Continuous Calibration and Estimation of Extrinsic Parameters
Collecting calibration data from multiple viewing directions is an important basis to ensure that the calibrated projective parameters can be well generalized to a wide range in 3-D space. The process as described in the previous Section continuously tracks the position of the calibration target and calibrates the devices in real-time. In order to project markers onto specified locations on the calibration target after change of viewing angle, the extrinsic parameters of the projector have to be recalculated. This can be done by solving a classical Perspective-n-Point problem (PnP) given some known 3D-to-2D mapping [17]. However, in our case such world-projector correspondences are not available.
We show that, by chaining previously acquired extrinsic parameters, the rigid transformation from world coordinate system to the projector-centered space can be estimated even without knowing any point correspondences. Let and be the extrinsic parameters of the camera and of the projector with respect to the i-th view, the extrinsic parameters of the projector of the n-th view can be estimated using and previously calibrated extrinsic parameters as: where 1,2, … , 1 . Note that the error of previously calibrated parameters will propagate along the chain. Therefore it is important to conduct the calibration and optimization procedures each time a new set of calibration data becomes available.

Projector-World Correspondences from Homography
In our work, the projected feature points are not aligned to the control points printed on the calibration target. As a result, a mechanism is required to assign world coordinates to each projected feature point. The homography from the calibration plane to the image is estimated for this purpose.
Under linear projection, the mapping from a pixel , to a control point , , 0 on the calibration plane ( 0) is encapsulated by a homography matrix H as: Given at least four point correspondences , , , 0 , the homography can be estimated by solving the over-determined homogeneous linear system [17]: In this work, the point correspondences are derived from the printed calibration points and their world coordinates (i.e., ), as shown in Figure 3. Once the homography is estimated, a projected feature point detected at pixel , can be associated to its world coordinates according to Equation (8).
In real world scenarios, the homography could be inaccurate when estimated from radially distorted pixels, and in turn, would result in imprecise projector-world correspondences. Therefore it is important to compensate the distortion in advance. To maintain such a camera-projector dependency, the projector-world correspondences will be updated each time the camera's distortion coefficients are adjusted.

Initial Correspondences from Line Patterns
The proposed method requires a "bootstrapping" stage to obtain initial estimate of parameters from which the incremental process can be initiated. In previous work [15], we have used a sequence of colored block patterns that extends the classical 1-D Gray-coded patterns to obtain initial correspondences. In this work, we adopt line features because they are easy and fast to detect, and also more robust against pattern interference and chromatic distortion. Figure 4(a) shows the image of a line pattern projected onto the checkerboard. It is easy to identify six lines despite the fact that some segments of the projected lines are greatly absorbed by the black squares.
A fast technique is designed to reliably locate projected lines. It first searches for the position of the checkerboard in the image using detected corner features. Once found, all pixels in the region of the checkerboard are taken into account to compute a dynamic threshold, such that only the brightest 2% of the pixels survive the binarization (see Figure 4(b) for an example). We then deploy a RANSAC technique to search for lines in the binarized image (see Algorithm 1 for its pseudo-codes). Figure 4(c) shows 9 lines detected in real-time. In the experiments we have found that the adopted method is more robust and faster than Hough Line Transform, which is also a popular line detection algorithm in

Target-A
In this s calibration p projected ca mprove the

Pattern
Given a ntrinsic pa , , ocated on p project onto   (10) or with filled circles of radius , given by: Taking perspective distortion into account, we use corner features in a chessboard pattern. First a base image is rendered in world coordinates, such that each feature is carried on , . Then the pattern is rendered according to: by means of a mapping function : , , . The mapping is essentially a homography transformation applied on undistorted projector screen: where , , , 1 is the inverse function of Equations (3) and (4) defined by intrinsic parameters that reverse the effects of the radial and tangential distortions in normalized image coordinates, and is the homography computed from , , , . In the implementation both and are discrete images indexed by integer subscripts while is a real function. Interpolation functions (e.g., bilinear interpolation, as adopted by our work) are used to determine the value of each pixel. (d) Classified pixels, blue, red, and white pixels represent "dark", "gray" and "bright" categories respectively.
(a) (b) (c) (d) Figure 5(a) illustrates a generated base image which contains corner features aimed to hit square centers of the calibration target-A 16-by-12 checkerboard. Since the squares have dimensions of 20 mm 20 mm, the pattern is rendered with shifts of 10 mm in both x and y directions. The base image can be transformed according to Equations (12) and (13) to generate a target-adapted calibration pattern, as shown in Figure 5(b). The projection of this pattern can be found in Figure 6(a).

Marker Detection and Matching in Sub-Pixel Accuracy
Image rectification is carried out using camera-world homography (see Section 4.1), then Otsu's threhsolding method [18] is applied twice to categorize rectified pixels into "dark", "gray", and "bright" groups. The results are shown in Figure 6. Only "bright" pixels are kept and all others are set to zero. The truncated image is then convoluted with a checker marker as shown in Figure 7(a). The result is normalized to produce corner scores, measuring how likely the corner feature occurs at each pixel. Scores are sorted and filtered so that only the top 5% of the pixels remain. The map is then segmented into regions. For each region a weighhted centroid is calculated to become a candidate feature point.
The feature points detected in the image have to be associated with , the rendered feature points on the projector's screen. As aforementioned, the matching can be efficiently done given , where are the expected locations of feature points in the image. We start on each expected location of feature points and search for the nearest observation. If the nearest observation is within a tolerable range, then the corresponding world-projector correspondence is updated via . Otherwise the feature point is marked as lost, and will be absent in the refined calibration data. Figure 7(c) shows a pair of matched and . The centroid extraction can also be applied to locate a spot features as shown in Figure 7(d).

Rejection of Calibration Data
A dynamically generated calibration pattern can miss its target after projection, if previously calibrated parameters have failed to be generalized to the new pose of the calibration board. It usually occurs when the system has not collected enough calibration data and the board has changed to a pose that is significantly different from its previous geometric configuration. It is necessary to detect such situation to prevent wrongly associated correspondences being used as valid calibration data.
The failure of a generated calibration pattern occurs if there are too few matched feature points, i.e., the observed results deviates greatly from our expectation. Hence, we set a condition to reject a calibration pattern if more than 50% of the feature points are lost. The generalization error is also taken into account to improve the robustness. The newly calibrated parameters are evaluated using each previously collected calibration data. If the inclusion of the new calibration data does not improve the overall performance for more than 50% of the calibrated views, it will be rejected as well.

Test Datasets
A projector-camera system has been set up, and a series of experiments have been conducted to evaluate how the proposed method improves the calibration process of a projector-camera system. The hardware specifications are listed in Table 1. The software is implemented on an Intel Core i7 quad-core laptop. The real-time detection of lines and all other computations are not GPU-accelerated. We use a customized checkerboard shown in Figure 6(a) as the calibration target. There are 192 corner features printed on the board, with 83 inner white squares for the projection of calibration feature points. The checkerboard's pose is changed 22 times during the acquisition of calibration data, with the first 4 poses used to generate the initial correspondences as described in Section 4.4.
Two different types of features, namely checker corners and light spots, are used to generate targetadapted calibration patterns (see Figure 7(c) and 7(d) for example). The established calibration datasets are named ADACHECKERS and ADASPOTS accordingly. These two datasets are compared with the dataset CONVENTIONAL, which acquires camera-projector correspondences using 14 Gray-coded and 16 phase-shifted patterns for each viewpoint [3].
We have implemented a Levenberg-Marquardt optimizer and applied it to all of the datasets, with identical tuning parameters and termination criteria. The linear and nonlinear parts of the calibrated parameters are listed in Tables 2 and 3 respectively. The calibrated camera parameters are also listed for reference. The asymptotic standard errors are also given in the tables to provide the confidence in parameter estimation. The standard errors are derived from the inverse of the numerically approximated Hessian matrix. The explicit establishment of camera-projector correspondences requires 704 frames to finish the calibration, while the proposed method uses 40 frames, which is only about 5% of the number of frames required by the conventional method. Other results are studied in the rest of this section.

Evaluation in Projective Plane
Reprojection error (RPE) is a commonly adopted projective indicator of how well the calibration data conform to the projection model with the calibrated parameters [7,8,[10][11][12][13][14]. We have measured reprojection errors in x-and y-coordinates separately according to Equations 5 and 6, and the root-mean-squares (RMSs) are calculated to summarize the performance of the parameters with respect to a dataset. After calibrating all 22 views, datasets ADACHECKERS, ADASPOTS, and CONVENTIONAL have achieved RPEs of 0.188, 0.301, and 2.196 pixels respectively. Figure 8 depicts their RPEs at each stage, with a set of calibration data collected from a new viewpoint. One may find that the adaptively rendered calibration patterns initially pose significant errors. However, the errors decrease as more calibration data are collected with a lower bound. The RPEs of the ADASPOTS are 1.6 times higher than that of the ADACHECKERS. The cause might be that the algorithm has adopted to locate spot centroids, which are not preserved under perspective projection. Compared to the use of explicit correspondences, the adaptively established datasets ADACHECKERS and ADASPOTS result in improvements of 91% and 86%, respectively.

Evaluation in Euclidean Space: Planarity Test
Using criteria that are not modeled in the objective functions is important to evaluate the optimized parameters. We have therefore conducted another test to verify the performance of the calibrated parameters in 3-D space. The parameters are used to triangulate the 3-D position of each control point. Since a planar target is used, all of the control points are expected to lie in a plane. The flatness of the manufactured calibration target has been assessed to be accurate to within 0.1 mm.
The best-fit planes of measured 3-D points are estimated, and the residuals are calculated. The box plots of the residuals are depicted in Figure 9. The overall 3-D RMS errors for the three datasets are 0.36 mm, 0.65 mm, and 2.42 mm, respectively. Improvements of 85% and 73% are achieved for ADACHECKERS and ADASPOTS, respectively.

Evaluation in Euclidean Space: Triangulation Error
The projector-camera system can be utilized as a structured light 3-D scanner once the extrinsic and intrinsic parameters of the camera and the projector are obtained. Given a dense correspondence map, one may apply triangulation to recover the surface of an object. The triangulation error is defined as the shortest Euclidean distance between a pair of back-projected rays. It can be used to evaluate the calibration, since a more accurate set of parameters implies that the back-projected rays are more precise, and consequently, the triangulation errors will be lower.
A statue with a highly irregular shape is selected to be scanned by the projector-camera system. The statue has a dimension of 450 mm by 250 mm by 240 mm in height, width, and depth. The statue is placed inside a working space spanned by the positions of the calibration board. About 105,000 points are triangulated using parameters calibrated from the ADACHECKERS and CONVENTIONAL datasets, and the calculated RMS errors are respectively 2.71 mm and 5.10 mm, or 1.8% and 3.4% compared to the depth of the recovered surface (145 mm). The error maps are shown in Figures 10(a) and (b). There are observable systematic errors in the recovered surface using parameters calibrated from the CONVENTIONAL dataset when the error maps are compared to the depth map shown in Figure 10(c). This is a commonly observed phenomenon if a set of parameters is not well generalized to the overall volume. As a measured point moves away from the optimal space of the parameters, the triangulation errors will increase quadratically. Based on this observation, we may verify that the parameters calibrated from the ADAMARKERS dataset are more robust since they conform better in a wider range.

Conclusions and Future Work
In this paper, we have presented an innovative method to reliably establish the calibration datasets for a project-camera system. The method can be applied to achieve the calibration of the camera and the projector using as the calibration target a single checkerboard, which is easy to obtain and widely used in the computer vision. The dynamically generated calibration patterns contain feature points for the calibration of the projector. Each feature point is arranged to hit the center of a particular white square on the calibration target, where its detection can be accurately performed. With a feedback mechanism, the system increases the accuracy of the generated patterns incrementally. As a result, establishing a calibration dataset becomes more accurate and faster than deploying dense acquisition of camera-projector correspondences. In the experimental results, the proposed method is able to achieve an improvement more than 80% over the conventional method in both projective and Euclidean tests, with a saving of 95% of the required calibration time. The RPEs in sub-pixel level are also attainable.
In the future, we aim to develop a system that tracks calibration targets and projects aligned calibration patterns in real-time. As can be seen in Figure 6, the corner features printed on the calibration target are still distinguishable due to the projection of the interleaved calibration pattern. Such a real-time application may hint a user to move the calibration target toward un-sampled space, instead of placing it randomly; since it is crucial to maximize the coverage of the working volume during the collection of calibration data to achieve accurate 3-D measurement,. The proposed method can also be modified and adapted towards the application of environment-aware data projections, such as those used in augmented reality applications.