Indoor Scene Point Cloud Registration Algorithm Based on RGB-D Camera Calibration

With the increasing popularity of RGB-depth (RGB-D) sensor, research on the use of RGB-D sensors to reconstruct three-dimensional (3D) indoor scenes has gained more and more attention. In this paper, an automatic point cloud registration algorithm is proposed to efficiently handle the task of 3D indoor scene reconstruction using pan-tilt platforms on a fixed position. The proposed algorithm aims to align multiple point clouds using extrinsic parameters of the RGB-D camera obtained from every preset pan-tilt control point. A computationally efficient global registration method is proposed based on transformation matrices formed by the offline calibrated extrinsic parameters. Then, a local registration method, which is an optional operation in the proposed algorithm, is employed to refine the preliminary alignment result. Experimental results validate the quality and computational efficiency of the proposed point cloud alignment algorithm by comparing it with two state-of-the-art methods.


Introduction
Three-dimensional (3D) scene reconstruction is an important issue for several applications of robotic vision such as map construction [1], environment recognition [2], augmented reality [3,4], and simultaneous localization and mapping (SLAM) [5,6]. Furthermore, 3D scene reconstruction usually requires numerous types of sensors such as stereo cameras, RGB-depth cameras, time-of-flight (TOF) cameras, lasers, and LIDAR. In this paper, we discuss how to employ RGB-D camera data for 3D scene reconstruction applications. RGB-D cameras use the structured light technique [7] to re-project the dispersed 3D data point set. The 3D colored point cloud information is obtained by using the RGB-D feature of combining two-dimensional (2D) RGB and depth images. Each frame of the RGB-D point cloud information has an independent coordinate system, which can be defined as the camera coordinate system C i at time i. When reconstructing a system, we expect to map each coordinate system C i to the same world coordinate system W, which serves as the same mapping target for each coordinate system C i . Therefore, in addition to defining a world coordinate system W, transformation matrices that transform each coordinate system C i to the world coordinate W are required. In our method, extrinsic parameters obtained from camera calibration are used to derive the transformation matrices which are then used to achieve a coarse reconstruction. Finally, some of the existing fine registration methods are employed to conduct the optimal 3D scene reconstruction.
Until now, several multi-view camera calibration studies have been performed with the aim of obtaining the transformation matrices for the transformation between different views. Transformation matrices have been used in numerous applications, for example, in a previous report [8], the bundle adjustment method was proposed to be employed to calibrate multi-view stereo cameras. In another report [9], the use of landmarks in a given scene as calibration references was proposed and the existing feature-matching methods were employed in the study to conduct multi-view landmark matching. Multi-view camera calibration was achieved in this manner, based on which a novel camera sequence method was suggested for integrating the information in an entire system and promoting the robustness of future applications. Li et al. [10] recommended using a feature description-based method, instead of the traditional checkerboard methods for camera calibration, by employing a special reference diagram as the camera calibration reference point in the feature description matching. In this study, we use the most common checkerboard multi-view camera calibration method to obtain the transformation relationships for different rotation angles, and apply them for point cloud registration. Holz et al. [11] documented the basic implementation method of using a point cloud library (PCL) to conduct registration. The method primarily involved capturing and aligning the feature descriptors to conduct an initial registration of the point clouds, and using the classic iterative closest point (ICP) [12] method to achieve fine registration. For all the operations, relevant application programming interfaces were provided by the PCL.
Local registration methods have proven to be able to generate an accurate reconstruction model if multiple point clouds are close enough to each other. Recently, several local registration methods with a higher accuracy have been proposed. Some studies [13,14] provide an overview of the fine and coarse registration methods. According to Reference [14], there are four types of local registration methods: the ICP method, Chen's method [15,16], signed distance fields [17], and genetic algorithms [18]. Here, we chose the ICP method for our research because it is the most commonly used method in practice. The ICP method was proposed by Besl et al. [12] for the efficient registration of 3D shapes including free-form curves and surfaces. Subsequently, various extensions of the ICP method have emerged. Rusinkiewicz et al. [19] derived an ICP variant based on the uniform sampling of the space of normals which could provide comparatively good convergence results for relatively small scenes and sparse features. Low [20] suggested approximating the nonlinear least-squares optimization problem with a linear least-squares problem to solve the point-to-plane ICP algorithm. The author used an approximation method to increase the efficiency for determining the optimal solution. Serafin and Grisetti [21] formulated a new optimized cost function that included not only the point-to-point distance, but also surface normals or tangents; the new function was shown to not only increase the convergence speed, but also be advantageous in regard to the function robustness. For example, in a report [22] a system called KinectFusion was proposed; the system was based on a graphics processing unit (GPU)-accelerated ICP method and used a handheld moving Kinect to reconstruct indoor 3D scenes.
Feature descriptors are a technique that has been developed over a long time, evolving from the original 2D image feature descriptors to the current several 3D feature descriptors. Compared with 2D descriptors, 3D descriptors are richer in geometric information. Regardless of the computational efficiency decline owing to the increase in information, 3D descriptors can provide more accurate information regarding 3D objects or scenes. Rusu et al. [23,24] recommended the point feature histogram (PFH) descriptor, and further improved it to yield the fast point feature histogram (FPFH) [25] that was applied to point registration problems. FPFH significantly reduced the computational load of PFH by using some cache methods and modifying a few formulas; consequently, PFH was able to immediately achieve 3D point registration. Tombari et al. [26] recommended the signature of histograms of orientations (SHOT) descriptor, and reviewed some of the existing methods, classifying them as signatures and histograms methods. Then, the two types of methods were combined to generate a new 3D descriptor method, with the SHOT information lying between the signatures and histograms, and exhibiting both their features. Nascimento et al. [27] suggested a binary appearance and shape elements (BASE) descriptor that integrated both the intensity and shape information of a point cloud, and it was established in binary, thereby significantly enhancing the matching speed. Furthermore, the BASE descriptor could be applied to the scenes that were under poor lighting conditions and had sparse textures. Schmiedel et al. [28] proposed the interest-robust normal distribution transforms (NDT)-map (IRON) descriptor and applied it to robot localization; this descriptor was a feature descriptor used to reflect surface curvatures and object shapes, and it was established via NDT mapping.
On the other hand, a different way to reconstruct 3D indoor scenes using multiple RGB-D cameras can be interpreted as a fusion of depth maps obtained from different views, which is known as RGB-D image stitching [29,30]. In Reference [29], Song et al. proposed a rotated top-bottom dual-Kinect system to extend the field of view (FOV) of depth scanning based on the synchronized alignment of two RGB-D sensors. In other words, they employed two Kinects to maximize the FOV by stitching two depth images together to form a depth panorama; however, their system requires at least two RGB-D cameras to accomplish the RGB-D image stitching task. Recently, Li et al. [30] proposed an image-based RGB-D image stitching method that uses the registration data from color images to register depth maps. In other words, a calibration process is firstly conducted to find the relationship between the depth map and the color image, which is then used to align the depth map with the color image. By doing so, the problem of registering depth maps can be transformed into the problem of color image stitching, which usually requires significant computation and is difficult to run in real time.
From the above literature review, it can be seen that most of the modern point cloud alignment algorithms require a lot of computing power. With a greater amount of valid data in the point cloud, the required amount of computing will grow in a non-linear fashion. To improve the processing speed of point cloud alignment, this paper presents a novel calibration-based method which employs camera calibration techniques to find out the transformation matrices between some prefixed motor control points in the offline state, and then uses these transformation matrices directly in the online state. However, current camera calibration approaches usually induce calibration errors in the extrinsic transformation matrices, which may cause large errors in the registration result. To overcome this issue, an alignment calibration method is proposed to refine the calibrated transformation matrices. Unlike the FOV extension approaches in [29,30], the proposed method uses a single RGB-D sensor combined with a sequential panning-tilting capturing process to produce an accurate global registration result. Moreover, the computational load of the proposed method is very low because it only needs a simple coordinate transformation computation for each input point cloud data.
The following content describes the indoor scene reconstruction system that is proposed in this study. First, we introduce the related work in Section 2, including a simple description and classification of the current registration methods. Second, Section 3 systematically presents the processing steps of the proposed offline and online operations. Next, Section 4 provides the implementation details, including the offline mathematical models for system calibration and the coordinate system transformation methods for online operations. Last, we compare the details of several different point cloud alignment methods and present the results in Section 5. The conclusion is given in Section 6.

Related Work
Point registration has been a very actively researched topic. It is the process of performing a transformation to determine the mapping relationship between two point clouds in the 3D space. A transformation relationship can be expressed in terms of rotation and translation, using camera calibration models to describe how cameras project 3D scenes to 2D images. In the following expression for a camera calibration model we see that 2D-3D projection is determined by an extrinsic parameter matrix R c w t c w ∈ 3×4 and intrinsic parameter matrix K. Here, R c w is a rotation matrix and t c w is a translation vector from the world frame to the camera frame. Then, u v 1 T is a homogenous coordinate of a 2D pixel on the image plane, whereas x y z 1 T is a homogenous coordinate of a 3D point in the world frame. s is a size ratio related to the depth of field, and it can be obtained from the depth images of a RGB-D camera.
Assume that there are two point data p i ∈ P and q j ∈ Q, where p i = [x i , y i , z i ] T is the ith point of the point cloud P and q j = [x j , y j , z j ] T is the jth point of the point cloud Q. If there is a mapping relationship between p i and q j , the Euclidean distance can be expressed as p i − q j . To reduce the distance, we must define a cost function in terms of the rotation matrix R q p and translation vector t q p between two point clouds P and Q. According to Reference [31], there are three different types of point cloud registrations-namely global, local, and local descriptors registration-to provide the 3D coordinate rotation matrix R q p and translation vector t q p . In the following section, we describe the difference between these three types of registrations and introduce their representative methods.

Local Registration
In this method, the distance between two point clouds should be short enough. Given a distance that is sufficiently short, a more accurate and rigid transformation may be obtained with the local registration of point clouds. Recently, the most dominant method has been the ICP method which determines the adjacent points in the two close point clouds P and Q, and defines a cost function E q p as follows: where p l k and q l k are the point correspondences that are sufficiently close to each other. E local is the sum of the errors of all the N point correspondences and k is the corresponding index. Then, an optimization method can be employed to determine the optimal E local . The advantage of this process is that it can evaluate the corresponding data points that are sufficiently close enough to each other, and thereby achieve a more accurate registration; however, this method has a prerequisite that the initial point clouds should be close enough to each other.
To minimize the cost function (Equation (2)), the random sample consensus (RANSAC) method can be implemented to iteratively obtain the optimal rotation matrix R q p and translation vector t q p . This process is advantageous in that it is able to determine the corresponding point clouds, which are sufficiently close to each other and generate a registration that is more accurate. In addition to the requirement of a short enough distance between the two initial point clouds, this process also involves significant computation. Segal et al. [32] proposed a new plane-to-plane ICP method incorporating the original and point-to-plane ICP methods, and demonstrated that the new method could provide better experimental results than the previous two methods.

Global Registration
Global registration can be initiated with two point clouds in any status, without requiring them to be close enough to each other, and is able to obtain a result similar to local registration. Several methods belong to the category of methods globally searching the point correspondences between two sets of point clouds or image features. Alternatively, a few methods employ some geometric approaches to determine the local point correspondences, i.e., the four-point congruent sets (4PCS) method [33]. When the point correspondences in the point clouds P and Q are obtained, the cost function can be where p g k and q g k are the detected point correspondences. Next, the cost function E q p can be obtained using an optimization method. The advantage of this process is that the point cloud registration can be conducted at a random position, but this process usually has a higher degree of computation complexity as compared with local registration. In Reference [33], the 4PCS method is proposed for global registration. Figure 1 illustrates a schematic of the four-point algorithm in the 4PCS method. This method randomly three points in a point cloud, referred to as p 1 , p 2 , and p 3 , and then uses a three-point plane to determine a fourth point p 4 . Subsequently, this method employs the cross-connected lines to obtain a cross point e, and calculates the ratio of the cross point distance r 1 = p 1 − e / p 1 − p 2 and r 2 = p 3 − e / p 3 − p 4 . Given r 1 and r 2 , all the points in the point cloud Q can now be connected, and the online candidates e 1 = q i + r 1 (q j − q i ) and e 2 = q i + r 2 (q j − q i ) can be calculated according to r 1 and r 2 . When the candidates e 1 and e 2 overlap with each other, the four points that correspond to P can be evaluated. With the four points, it is possible to obtain the rotation matrix R q p and translation vector t q p in the 3D space. The advantage of global registration is that this method can conduct point cloud registration at a random position, but this method usually leads to larger registration errors as compared with local registration. point correspondences. Next, the cost function q p E can be obtained using an optimization method.
The advantage of this process is that the point cloud registration can be conducted at a random position, but this process usually has a higher degree of computation complexity as compared with local registration. In Reference [33], the 4PCS method is proposed for global registration. Figure 1 illustrates a schematic of the four-point algorithm in the 4PCS method. This method randomly three points in a point cloud, referred to as p 1 , p 2 , and p 3 , and then uses a three-point plane to determine a fourth point p 4 . Subsequently, this method employs the cross-connected lines to obtain a cross point e, and calculates the ratio of the cross point distance can be calculated according to r 1 and r 2 . When the candidates e 1 and e 2 overlap with each other, the four points that correspond to P can be evaluated. With the four points, it is possible to obtain the rotation matrix q p R and translation vector q p t in the 3D space. The advantage of global registration is that this method can conduct point cloud registration at a random position, but this method usually leads to larger registration errors as compared with local registration.

Local Descriptors Registration
Local descriptor registration requires a feature descriptors matching process to search for point correspondences in two point clouds, with the data of the two point clouds not required to be close to each other initially. By contrast, local and global registration methods search for point correspondences locally or globally under certain conditions without using feature descriptors, leading to an inefficient point correspondence determination process.
The feature descriptors usually contain information about point features in a local region such as surface, normal, signature, and texture. In this method, the feature descriptors may be extracted by different approaches to have the search capability for the point correspondences. By using feature descriptors to search for similar points that are the most close to one another in terms of the Euclidean distance, we can determine the point correspondences in two point clouds, with k as the corresponding index. Furthermore, we can use the point correspondences to define the cost function as A relatively classic descriptor is the PFH descriptor which extracts the translation invariance feature from a key point. This feature descriptor is a 3D data that is related to the angles between the key point and an adjacent point. In a study [26], further developments have been made to this feature descriptor to obtain the FPFH descriptor that was used in point registration problems. It was shown that the computation could be significantly reduced by using some cache methods and modifying formulas, and, therefore, the 3D point registration could be conducted instantly. Zhou et

Local Descriptors Registration
Local descriptor registration requires a feature descriptors matching process to search for point correspondences in two point clouds, with the data of the two point clouds not required to be close to each other initially. By contrast, local and global registration methods search for point correspondences locally or globally under certain conditions without using feature descriptors, leading to an inefficient point correspondence determination process.
The feature descriptors usually contain information about point features in a local region such as surface, normal, signature, and texture. In this method, the feature descriptors may be extracted by different approaches to have the search capability for the point correspondences. By using feature descriptors to search for similar points that are the most close to one another in terms of the Euclidean distance, we can determine the point correspondences p d k ∈ P and q d k ∈ Q in two point clouds, with k as the corresponding index. Furthermore, we can use the point correspondences to define the cost A relatively classic descriptor is the PFH descriptor which extracts the translation invariance feature from a key point. This feature descriptor is a 3D data that is related to the angles between the key point and an adjacent point. In a study [26], further developments have been made to this feature descriptor to obtain the FPFH descriptor that was used in point registration problems. It was shown that the computation could be significantly reduced by using some cache methods and modifying formulas, and, therefore, the 3D point registration could be conducted instantly. Zhou et al. [34] recommended a fast global registration method based on FPFH which allowed rapid registration convergence; compared with some of the existing methods, this method was a more real-time computation while having a certain registration accuracy.

System Architecture
In this section, we introduce the system architecture proposed in this study. Figure 2 illustrates the system architecture of the 3D scene reconstruction. First, the point cloud information for the scene reconstruction has to be obtained via device sensing. In an indoor scene, we can use a pan-tilt-driven RGB-D camera to record parts of the scene. It can be seen from Figure 3  the pan-tilt records continuous frames F 1 , F 2 , and F 3 . Here, T 2 1 is the view transformation matrix to transform F 1 to F 2 , whereas T 3 2 is the view transformation matrix to transform F 2 to F 3 . F i contains the information of the RGB and depth images, which are transformed-with tools-to yield the color point cloud information P i = {{p 1 , c 1 }, {p 2 , c 2 }, ...}. Here, p j = [x j , y j , z j ] T is the position of point j in a point cloud (in the F i coordinate) and c j = [r j , g j , b j ] T is the color of point j in the point cloud. Given that each F i has its own coordinate system, it is necessary to use the transformation matrix T 0 i to combine those coordinates into one. Note that the proposed system does not include processes of feature matching, outlier removal, and motion estimation, because the required transformation matrices are obtained from the offline calibration. This design greatly reduces the processing time of 3D scene reconstruction and is also the main difference between the proposed method and the existing ICP-based methods.
real-time computation while having a certain registration accuracy.

System Architecture
In this section, we introduce the system architecture proposed in this study. Figure 2 illustrates the system architecture of the 3D scene reconstruction. First, the point cloud information for the scene reconstruction has to be obtained via device sensing. In an indoor scene, we can use a pan-tiltdriven RGB-D camera to record parts of the scene. It can be seen from Figure 3  T is the view transformation matrix to transform F1 to F2, whereas 3 2 T is the view transformation matrix to transform F2 to F3. Fi contains the information of the RGB and depth images, which are transformed-with tools-to yield the color point cloud information Pi = {{p1, c1}, {p2, c2}, ...}. Here, pj = [xj, yj, zj] T is the position of point j in a point cloud (in the Fi coordinate) and cj = [rj, gj, bj] T is the color of point j in the point cloud. Given that each Fi has its own coordinate system, it is necessary to use the transformation matrix 0 i T to combine those coordinates into one. Note that the proposed system does not include processes of feature matching, outlier removal, and motion estimation, because the required transformation matrices are obtained from the offline calibration. This design greatly reduces the processing time of 3D scene reconstruction and is also the main difference between the proposed method and the existing ICP-based methods.

Forward Kinematics Approach
Because the RGB-D camera is driven by a pan-tilt device, the transformation matrix can be directly obtained from a forward kinematics analysis of the motion platform [35]. Figure 4 illustrates the motion model of a pan-tilt device considered in the proposed system. Because the coordinate transformation of the camera frame is determined by connection links of the pan-tilt unit, it is mandatory that the transformation between each connection should be taken into consideration.

Forward Kinematics Approach
Because the RGB-D camera is driven by a pan-tilt device, the transformation matrix can be directly obtained from a forward kinematics analysis of the motion platform [35]. Figure 4 illustrates the motion model of a pan-tilt device considered in the proposed system. Because the coordinate transformation of the camera frame is determined by connection links of the pan-tilt unit, it is mandatory that the transformation between each connection should be taken into consideration. Given that the pan, tilt, RGB-D camera, and camera frames are all on different coordinate axes, the corresponding forward kinematics Equation can be derived from the conventional Denavit-Hartenberg (D-H) link parameter method [36], which defines four D-H link parameters: the link length a k , the link twist α k , the link offset d k , and the joint angle θ k . Table 1 shows the D-H link parameters of the pan-tilt device used in the experiment (see Section 5). According to Table 1, the transformation matrix between k−1 and k joint frames can be computed by where c θ ≡ cos θ and s θ ≡ sin θ. Based on the joint coordinate transformation in Equation (3), the forward kinematics Equation of the pan-tilt device can be obtained by chain multiplying four link transformation matrices such that where θ p and θ t denote the pan angle and the tilt angle of the device, respectively. Therefore, the transformation matrix in Equation (4) can be used to align the point cloud at the frame F i to F 0 directly.

Camera Calibration Approach
Another way to obtain the transformation matrix information of each F i coordinate frame pertains to the offline calibration of the device. In this approach, the motor must be rotated to generate different camera views, and the rotation control is achieved via offline decision-makers. The following are the steps for the offline calibration operations: 1.
Start to capture the initial frame F 0 , and define its coordinate as a world coordinate W.

2.
Conduct multi-view calibration to obtain the extrinsic parameter matrix for the initial world coordinate.

3.
Control the motor and allow it to rotate to the next point.

4.
Capture the frame F i , and conduct multi-view calibration to obtain the extrinsic parameter matrix.

5.
Generate point cloud P i , and conduct point cloud alignment calibration to obtain the transformation matrix. 6.
If completed, store all the transformation matrices; otherwise, go back to Step 3 to make a decision.
Therefore, prior to a scene reconstruction, we require conducting offline calibration to obtain multi-view transformation matrices. The implementation of offline calibration will be discussed in detail in Section 3.
With respect to online operation, we first develop a model for the 3D scene reconstruction. Because this system consists of a motor-controlled and point cloud-registered part, the online decision-makers are entirely responsible for the allocation and processing. Subsequently, 3D scene reconstruction is conducted according to the following steps: 1.
Start to capture the initial frame F 0 and generate point cloud P 0 , and define the coordinate of the point cloud P 0 as the world coordinate W.

2.
Control the motor and allow it to rotate to the next point, wait until the motor rotates to the fixed point, and then haul back the motor to complete the command.

3.
Capture the frame F i and generate a point cloud P i .

4.
Transform the point cloud P i to P i 0 using the corresponding transformation matrix T i 0 estimated in the offline calibration.

5.
If completed, generate the initial 3D scene reconstruction model; otherwise, go back to Step 3 to make a decision of aligning the next point cloud. 6.
If the fine registration step is enabled, then use one of the existing ICP methods to refine the initial 3D scene reconstruction model; otherwise, use the initial 3D scene reconstruction model as the final result. 7.
Generate the final 3D scene reconstruction model.
When the 3D scene reconstruction is complete, the outlier points can be removed using some of the existing outlier removal methods [37,38]. The detailed operations of the point cloud registration will be presented in Section 4.

The Proposed Method
This study primarily proposes a 3D scene construction method that integrates motor control and camera calibration. The camera calibration is conducted via a camera to first detect and identify a key point in the checkerboard. With offline calibration, a relative transformation matrix for the camera in a fixed position can be obtained and used for global registration. In the following content, we present the offline calibration and global registration methods proposed by us, and also introduce how to integrate them with local registration.

Offline Calibration
This subsection primarily introduces the proposed offline calibration method of generating multi-view transformation matrices based on multi-view camera calibration. Suppose that the intrinsic parameter matrix K can be obtained by fixing the extrinsic parameter matrix as [ I 0 3×1 ] (do not move the view) and using camera calibration [10]. Our method aims to obtain the extrinsic parameter matrix between the initial frame F 0 and the ith frame F i . Figure 5 illustrates the different images in which the object (checkerboard) exists in each image captured from a different view. In this approach, the intrinsic parameters are fixed, whereas the extrinsic parameter matrices are obtained via camera calibration. Based on the extrinsic parameters, the following formula is obtained: where P r is the coordinate of the key point on the checkerboard that is used as the coordinate system. T i r is the transformation matrix to map P r to P i . With camera calibration, we can obtain the transformation matrices at different views. Next, we want to obtain the relative transformation matrix between two views. Provided that F 0 is defined as the world coordinate W, we can assume that all P i should be transformed to F 0 , and we thereby obtain the following formula: Then we expand and reorganize Equation (6) to form a complete transformation formula of P i to F 0 such that where we define the rotation matrix R 0 i = R 0 r (R i r ) T for the transformation of P i to the F 0 coordinate system and the translation vector t 0 i = −R 0 r (R i r ) T t i r + t 0 r . Finally, we calculate a transformation matrix of each view according to Equation (7), and thereby obtain all the transformation matrices as follows: These are the initial transformation matrices that are obtained by applying linear methods to the 2D point correspondences. However, these transformation matrices encounter calibration errors, which may deteriorate the registration result. To deal with this issue, we include fine registration in the offline calibration, a step that needs the identifiable geometric features of a scene. We can place some simple geometric objects in the scene as shown in Figure 6, in which each scene contains some geometric objects in addition to the board. The next step is to conduct 3D point alignment calibration, and this step requires local registration for conducting calibration. Here, we choose to use ICP. Because we can choose the way that the objects are placed in a scene to obtain good results, we do not need to select a complex method as long as the geometric objects are separated enough from one another and are representative. This part must be conducted using two adjacent point clouds. Suppose that point cloud P 0 i which consists of each point P i and P 0 can be obtained after the transformation matrix in Equation (8), and, therefore, two adjacent point clouds can be denoted as P 0 i−1 and P 0 i . Next, with alignment calibration, we can obtain a refinement transformation between P 0 i−1 and P 0 i as follows: where R (i) 0 and t (i) 0 are the ith refinement rotation matrix aligning P 0 i to P 0 i−1 in the F 0 coordinate system and the ith refinement translation vector, respectively. Given that P 0 i and P 0 i−1 are all in the F 0 coordinate system, we can substitute Equation (7) into Equation (9) to form a new transformation relationship as follows: According to Equation (10), we can obtain the aligned transformation matrix against the calibration errors such thatT Finally, the information of these transformation matrices is stored for use in the online operation.
we do not need to select a complex method as long as the geometric objects are separated enough from one another and are representative. This part must be conducted using two adjacent point clouds. Suppose that point cloud 0 i P which consists of each point P i and P 0 can be obtained after the transformation matrix in Equation (8), and, therefore, two adjacent point clouds can be denoted as 0 1 − i P and 0 i P . Next, with alignment calibration, we can obtain a refinement transformation  (7) into Equation (9) to form a new transformation relationship as follows: According to Equation (10), we can obtain the aligned transformation matrix against the calibration errors such that (11) Finally, the information of these transformation matrices is stored for use in the online operation.

Online Operation
This subsection primarily introduces how to use the multi-view transformation matrices to achieve initial and fine registration. Given that we have stored the transformation matrices in offline calibration, it will be easy for us to rapidly achieve initial registration. In Section 3, we introduced the initial world coordinate W and defined it as the coordinate of F0, while all the transformation matrices that we have obtained map to F0. First, we take F1 as an example. Provided that we want to let point cloud P1 of F1 be transformed to F0, we can use the following formula: Furthermore, an optional fine registration can be used to reduce the convergence error between the cloud point of P0 and that of P1. In other words, we conduct fine registration by using some existing local registration methods, such as ICP. After refining the transformation matrix, the new point cloud will be generated as shown below,

Online Operation
This subsection primarily introduces how to use the multi-view transformation matrices to achieve initial and fine registration. Given that we have stored the transformation matrices in offline calibration, it will be easy for us to rapidly achieve initial registration. In Section 3, we introduced the initial world coordinate W and defined it as the coordinate of F 0 , while all the transformation matrices that we have obtained map to F 0 . First, we take F 1 as an example. Provided that we want to let point cloud P 1 of F 1 be transformed to F 0 , we can use the following formula: Here, P 0 1 is the transformed point cloud P 1 in the F 0 coordinate system. When the point clouds at all the views are processed by Equation (12), we can obtain an initial reconstructed 3D scene model P w = P 0 , P 0 1 , P 0 2 , . . . , P 0 N , where N represents the number of multiple views. Furthermore, an optional fine registration can be used to reduce the convergence error between the cloud point of P 0 and that of P 1 . In other words, we conduct fine registration by using some existing local registration methods, such as ICP. After refining the transformation matrix, the new point cloud will be generated as shown below, whereP 0 1 is the point cloud P 1 transformed to the F 0 coordinate after local registration. T (1) 0 is a refined transformation matrix that is generated with the fine registration of point cloud P 0 1 which has already been subjected to the initial registration. If the point clouds at all the views are treated according to Equations (12) and (13) and then combined into the same coordinate of F 0 , we can finally obtain a better reconstructed 3D scene modelP w = P 0 ,P 0 1 ,P 0 2 , . . . ,P 0 N .

Experimental Results
Because the proposed algorithm is based on the combination of pan-tilt camera control and coordinate transformation to perform point cloud alignment, we did not use the existing public database in the experiment due to the requirement of offline calibration. A RGB-D camera and a pan-tilt platform were used in the experiment. We used the ASUS Xtion Pro RGB-D camera (ASUSTek Computer Inc., Taipei, Taiwan) [39] and mounted the RGB-D camera on the FLIR E46-17 pan-tilt platform (FLIR Motion Control System Inc., Goleta, CA, USA) [40], as shown in Figure 7. Table 2 shows the specifications of the FLIR E46-17 pan-tilt platform, which was used to drive the RGB-D camera for capturing indoor scene point clouds. Two state-of-the-art methods were employed in the experiment to compare with the proposed method. The first one is the fast global registration (FGR) method proposed in Reference [34], and the second one is the Super4PCS method proposed in Reference [31]. These two methods do not require offline calibration and usually can provide robust and accurate registration results. Moreover, we also employed the ICP algorithm as the refinement process for all methods used in the experiment. All experiments were carried out on an Ubuntu 14.04 notebook computer equipped with Intel Core i7-3530M CPU and 8 GB memory.    Figure 8 presents the comparison results between the proposed method (Equation (11)) and the   Figure 8 presents the comparison results between the proposed method (Equation (11)) and the forward kinematics method (Equation (4)). Visually observing Figure 8 finds that both methods provide similar point cloud registration results. Therefore, the correctness of the forward kinematics Equation (4) is verified. Figure 9 shows the experimental results of point cloud registration obtained from the proposed and the state-of-the-art methods. In this experiment, the minimum distance between the scene and the camera is about 1 m. In Figure 8, the pictures listed in the left column are two original point clouds captured from different viewers in the same scene. We can see from Figure 9 that all methods succeed in aligning the two point clouds. The pictures listed in the second column are the results of the proposed method, which illustrate that an initial registration result (the upper second column) can be obtained efficiently. The third column of the pictures presents the registration results of the Super4PCS method. Only using Super4PCS also can obtain an initial alignment result (the upper third column), which can be further coupled with the ICP method to make a refinement result (the lower third column). The right column of the pictures presents the registration results of the FGR method. The initial alignment result of the FGR method is shown in the upper right column, and it also can be improved by the ICP method to make the registration result better (the lower right column). Figures 10-12 show the experimental results of three different rooms obtained from the proposed and the compared methods. One can see that the proposed method still provides a competitive point cloud registration performance in comparison to the Super4PCS and FGR methods. Figure 13 shows the registration results of the Room 1, Room 2, and Room 3 datasets obtained from the proposed method; each dataset contains 15 point clouds captured in different camera views that need to be aligned. One can see that the initial registration results of the proposed method are very close to the ICP refined ones. Therefore, the above experimental results validate the efficiency and robustness of the proposed method. 1st point cloud data Proposed method Super4PCS [31] FGR [34] 2nd point cloud data Proposed method + ICP Super4PCS + ICP FGR + ICP Figure 8. Comparison results between the proposed method and the forward kinematics method. Figure 8. Comparison results between the proposed method and the forward kinematics method.

Quantitative Evaluation
To quantitatively evaluate the registration performance of the proposed and the compared methods, we employed the root mean square (RMS) metric [41] in the experiment as follows: ( d (14) where N ℜ ∈

Quantitative Evaluation
To quantitatively evaluate the registration performance of the proposed and the compared methods, we employed the root mean square (RMS) metric [41] in the experiment as follows: where d min ∈ N is a vector containing the minimum distance (in millimeters) of N closest points between two aligned data scenes, and d min (i) denotes the ith element of the vector d min . To evaluate the registration performance of the proposed aligned transformation matrixT 0 i , Table 3 records three online registration comparison results between the Equations (4), (8) and (11) solutions. It is clear that the solutions of Equations (4) and (8) produce registration results with higher RMS errors due to modeling uncertainties and camera calibration errors, respectively. In contrast, the solution of Equation (11) produces better registration results against the calibration errors. Therefore, the registration performance of the proposed formula in Equation (11) is validated. To investigate the effect of the minimum distance between the scene and the camera to the registration performance, we considered three cases, 1 m, 3 m and 5 m, in the experiment. Table 4 shows the average RMS errors of the registration results in these three cases. One can observe that the proposed method provides the best initial registration result for all three cases, followed by the FGR method and the Super4PCS method. After the fine registration process, the FGR method yields the best result, which is also very close to the refinement result of the proposed method. Furthermore, when the minimum distance between the scene and the camera increases from 1 m to 3 m, the RMS error of the proposed method just slightly increases about 0.2374 in average. In contrast, the average RMS errors of the two compared methods increase about 8.0790 and 0.4554 for the Super4PCS and the FGR methods, respectively. This implies that the proposed method is more robust to the variation of the distance between the scene and the camera when compared to the two state-of-the-art methods. However, when the minimum distance increases to 5 m, the average RMS errors of all methods increase rapidly above about 31. This is caused by the sensor limitation of the ASUS Xtion Pro RGB-D camera, whose effective sensing distance ranges from 0.8 m to 3.5 m. Therefore, in the case of the 5 m minimum distance, the captured point cloud has a large number of invalid data points, which significantly deteriorates the registration performance of the proposed and the compared methods. Table 4 also records the RMS results of the three registration results shown in Figures 10-12. It is clear from Table 4 that the proposed method produces the best initial registration results compared to the FGR method and Super4PCS method. On the other hand, the ICP refinement process only makes a minor improvement to the proposed method. Because the proposed method can provide accurate initial registration results, the ICP refinement process is an ignorable step in the proposed multi-view point cloud registration system.

Computational Efficiency
One major advantage of the proposed method is its high computational efficiency. Table 5 shows the average processing time of all the methods in the experiment. Observing Table 5 finds that the processing time of the proposed method is only about 4 ms regardless of the effect of distance and environment variations, which is much faster than the Super4PCS and the FGR methods to produce the initial registration result. However, the ICP refinement greatly increases the processing time of the proposed method. Therefore, without the ICP refinement process, the proposed method is able to accurately align multi-view point clouds in real time. Note that the average processing time of the proposed alignment calibration is about 52,914.98 ms. However, this step is only performed once in the offline calibration process.

Conclusions
In this paper, an efficient point cloud registration method is proposed to align multi-view point clouds based on a camera calibration technique. One advantage of the proposed method is that it converges quickly and only needs to perform point cloud transformations without any iterative process. The proposed method consists of an offline calibration and an online operation. In the offline calibration, the relationship between different fixed camera views is calibrated to form several specific coordinate transformation matrices. Moreover, an alignment calibration process is proposed to obtain more accurate transformation matrices against the calibration errors. In the online operation, the point clouds captured from these fixed camera views can be directly and accurately aligned through the corresponding transformation matrices. Experimental results show that the proposed method is able to produce accurate point cloud registration results, even if no fine registration process is performed. Furthermore, the proposed method can process in real time. The average processing time of the proposed method is only about 4 ms running on a commercial notebook computer to align two point clouds, which is much faster than the existing ICP-based point cloud registration approaches. These advantages can make the registration of multi-view point clouds possibly become an online process with high registration accuracy via a sequential panning-tilting capturing process of a single RGB-D camera.
In the future, the proposed method will be further extended to combine with different control platforms to facilitate a variety of 3D reconstruction applications, such as 3D dense-map building, object model reconstruction, etc.