Three-Dimensional Registration for Handheld Profiling Systems Based on Multiple Shot Structured Light

In this article, a multi-view registration approach for the 3D handheld profiling system based on the multiple shot structured light technique is proposed. The multi-view registration approach is categorized into coarse registration and point cloud refinement using the iterative closest point (ICP) algorithm. Coarse registration of multiple point clouds was performed using relative orientation and translation parameters estimated via homography-based visual navigation. The proposed system was evaluated using an artificial human skull and a paper box object. For the quantitative evaluation of the accuracy of a single 3D scan, a paper box was reconstructed, and the mean errors in its height and breadth were found to be 9.4 μm and 23 μm, respectively. A comprehensive quantitative evaluation and comparison of proposed algorithm was performed with other variants of ICP. The root mean square error for the ICP algorithm to register a pair of point clouds of the skull object was also found to be less than 1 mm.


Introduction
Three dimensional measurements are popular in computer vision owing to their applications in medical and scientific imaging, reverse engineering, security, cultural heritage, industrial inspection and 3D map building. Several techniques, e.g., laser ranging, structured light, and passive stereo vision can be utilized for 3D range data acquisition. As a result of the rapid development of these sensing techniques, scientists and researchers have taken great interest in the multiview 3D reconstruction of real objects. The general procedure of generating 3D models of an object includes the acquisition of 3D data from different viewpoints (partial 3D shapes of the object) and integration of these point clouds into a 3D model. The complete process to generate the 3D model from several partial views is known as multiview 3D reconstruction.
Approaches for 3D reconstruction [1][2][3] based on passive stereo vision have been proposed in the literature. These approaches pose the correspondence problem [4] when the scenes or images lack sufficient texture on the surface of the 3D object. This problem was resolved by using structured light techniques [5][6][7] in which a projector (or projection system) replaces one of the cameras in the stereo pair and a coded pattern is projected onto the 3D object. User may not be able to acquire the complete 3D model from the modeling system in a single measurement step owing to self-occlusion and a limited field of view and requires merging multiple views into a complete 3D model [8].
To merge different range images, we need to align these scans with respect to a common coordinate system using the process known as registration. Multi-view 3D registration is very popular due to its applications in different fields, e.g., human body detection, 3D object scanning, 3D localization and ego-motion estimation.
Multiview 3D reconstruction may be classified into two categories [9]: the first uses a fixed sensor for an object performing handheld motion, and the other uses a handheld scanner for a fixed object. Let us consider the handheld rotation of an object by a small angle, wherein the images of the object are captured by a fixed sensor; here the range images can be aligned using a refinement algorithm owing to the fixed camera coordinate system, whereas for the same rotation of handheld scanner, this assumption is not valid owing to large displacement of the object in two different views and to the change in camera coordinate system [9]. Hence the multiview registration approach to tackle handheld operation, consists of two stages, coarse registration and refinement using the iterative closest point (ICP) algorithm [8,9]. The coarse registration is needed to handle unstable handheld motion and especially for tackling large motion where ICP-based refinement does not perform well. The coarse registration is used to estimate the initial parameters of the camera pose and then the refinement technique is applied on the pair of coarsely registered 3D datasets. In case of the failure of the refinement stage owing to unstable handheld motion, the multiview 3D reconstruction utilizes the fast coarse registration stage. If the coarse registration transforms the 3D data using an accurate pose, the refinement stage starts registering the 3D point clouds again following a coarse-to-fine strategy [9].
Researchers have intensively studied the problem of registration of 3D shapes in the last two decades. Readers may find the details of these studies in reviews [10,11]. Registration problems can be classified into two categories: pairwise registration (local method) or multiview registration (global method) [12]. Pairwise registration may be defined as the registration of the overlapping views and the user may formulate the problem as the sum of squared distances provided that the 3D correspondences are known. The locally aligned range images using pairwise registration may be integrated into the 3D model, leading to loop closure problem, which may be resolved using global method known as multiview registration. The comparision of the local and global methods [13] is given under the following features: (1) Local methods perform the registration on the pair of point clouds in an iterative manner while the global methods consider all the point clouds matching key geometric features among them and generate an optimum solution using RANSAC (Random sample consensus) frame work; (2) Local methods need good initial solution for their better performance and global methods donot require any good initialization, but they face the problem of incorrect and insufficient matched features; (3) As the global methods suffer from the problem of incorrect and insufficient correspondences, local methods can be used to refine the registration yielded by the global methods.
In rigid registration, we can model the transformation between the point clouds using 6 degrees of freedom (DOF). Researchers employed the registration approaches based on either the Singular Value Decomposition (SVD) [14] or the Principal Component Analysis (PCA). The literature also reported the registration based on the advanced iterative scheme using the ICP algorithm [15]. Researchers have proposed several variants of ICP, which are non-linear ICP [16], generalized ICP [17], and non-rigid ICP [18]. The user may select any of these variants of the ICP algorithm depending on several characteristics, which are accuracy, convergence rate, robustness and computation time. All these characteristics depend on the application of interest, 3D data and the imaging environment.
In this paper, the registration approach for the 3D handheld profiling system based on stereo vision and multiple shot structured light is proposed. This system consists of a stereo camera and a non-calibrated projector [19] and finds application in the 3D modeling and the reconstruction of the 3D objects. The proposed approach can be divided into three steps i.e., the two view 3D reconstruction based on active stereo vision, estimation of the relative translation and rotation for different views using visual navigation and multi-view registration based on the ICP algorithm. The remainder of this paper is organized as follows: Section 2 describes the methodology of the proposed research. In Section 3, we discuss the experiments and results conducted using the proposed approach of the 3D registration and the 3D profiling system based on multiple shot structured light. Section 4 concludes this research and also provides the directions for future work.

Materials and Methods
The proposed method can be described into three stages: proposed approach, two view 3D reconstruction and multi-view 3D reconstruction.

Proposed Approach
The proposed handheld profiling system consists of a stereo camera and a non-calibrated illumination projector employed for 3D modelling, which is different from camera-projector based systems [4,19]. The 3D sensing systems [20][21][22][23][24] are related to the proposed 3D sensing system, but are employed for single view geometry. We have previously reported a procedure for the 3D reconstruction for variable zoom using stereo vision and structured light [25], but it was based only on the single view 3D reconstruction. The proposed hardware of the handheld system comprises of stereo camera and non-calibrated projector without zoom lenses, and the multiview 3D registration is proposed in this paper. This research is an extension of the previous work [25] on the 3D reconstruction; it enhances the accuracy of the 3D reconstruction using a multi-view procedure. The proposed approach belongs to the multiview 3D reconstruction consisting of the handheld system for a fixed 3D object. Owing to large motion and the change in camera coordinate system in our case, the multi-view registration based on coarse registration and pairwise ICP based final refinement is proposed. The final refinement based on the ICP algorithm depends on the coarse registration stage. If the coarse registration transforms the point clouds using accurate visual navigation parameters, the refinement stage further enhances the accuracy of the 3D model. The stereo camera system consisted of two cameras (acA2500-14gm, Basler, Exton, PA, USA). The projector used in this work was an mini beam PA75K (LG, Daejeon, Korea).
Passive stereo vision-based 3D imaging poses the correspondence problem owing to less texture in a scene. To solve this problem, structured light technique is used to create artificial texture in the resulting images [20]. The block diagram of the proposed approach is shown in Figure 1 which depicts the flow of different algorithms in this research. approach of the 3D registration and the 3D profiling system based on multiple shot structured light. Section 4 concludes this research and also provides the directions for future work.

Materials and Methods
The proposed method can be described into three stages: proposed approach, two view 3D reconstruction and multi-view 3D reconstruction.

Proposed Approach
The proposed handheld profiling system consists of a stereo camera and a non-calibrated illumination projector employed for 3D modelling, which is different from camera-projector based systems [4,19]. The 3D sensing systems [20][21][22][23][24] are related to the proposed 3D sensing system, but are employed for single view geometry. We have previously reported a procedure for the 3D reconstruction for variable zoom using stereo vision and structured light [25], but it was based only on the single view 3D reconstruction. The proposed hardware of the handheld system comprises of stereo camera and non-calibrated projector without zoom lenses, and the multiview 3D registration is proposed in this paper. This research is an extension of the previous work [25] on the 3D reconstruction; it enhances the accuracy of the 3D reconstruction using a multi-view procedure. The proposed approach belongs to the multiview 3D reconstruction consisting of the handheld system for a fixed 3D object. Owing to large motion and the change in camera coordinate system in our case, the multi-view registration based on coarse registration and pairwise ICP based final refinement is proposed. The final refinement based on the ICP algorithm depends on the coarse registration stage. If the coarse registration transforms the point clouds using accurate visual navigation parameters, the refinement stage further enhances the accuracy of the 3D model. The stereo camera system consisted of two cameras (acA2500-14gm, Basler, Exton, PA, USA). The projector used in this work was an mini beam PA75K (LG, Daejeon, Korea).
Passive stereo vision-based 3D imaging poses the correspondence problem owing to less texture in a scene. To solve this problem, structured light technique is used to create artificial texture in the resulting images [20]. The block diagram of the proposed approach is shown in Figure 1 which depicts the flow of different algorithms in this research.

Two View 3D Reconstruction
The two view 3D reconstruction approach is similar to the method described in [25], which utilizes the binary coded structured light and normalized cross correlation (NCC) for pattern projection and stereo matching respectively. The calibration object used in the current study was a 7 × 6 chessboard target. The algorithm in [26] was employed for corner detection of the chessboard target.
In the current research, the binary coded multiple shot structured light technique was used to acquire 3D scans for the handheld profiling system. A lot of research has been performed in the 3D handheld scanning field, but these approaches utilized the single shot structured light technique. To the best of our knowledge, there are no reports of handheld scanning approaches utilizing a multiple shot structured light technique. If the target 3D object is static and the application does not impose stringent constraints on the acquisition time, multiple-shot techniques can be used and may often result in more reliable and accurate results. However, if the target is moving, single-shot techniques have to be used to acquire a snapshot 3D surface image of the 3D object at particular time instance [27]. A high speed and low-cost approach for structured light pattern sequence projection using a fast rotating binary spatial light modulator is reported in [28]. This system has the capability to yield high accuracy measurements at 200 Hz of the projection frequency and 20 Hz of the 3D reconstruction rate. The research reported in [29] describes the system, which consists of a projector that is held in one hand and a fixed camera, that captures the 3D object's geometry in less than 1 s using pattern sequence projection and reconstructs it in less than 30 s on a desktop computer. This approach may also be extended to obtain the representation of a whole object and align the different view point clouds using the ICP algorithm [29]. According to the research studies in [28,29], handheld scanning based on multiple shot structured light is feasible if a special hardware is utilized and the pattern projection and capturing processes take less time compared to handheld motion. Our system is also different from that reported in [29]. The proposed hardware is also capable of projecting and capturing binary patterns in less than 1 s for single scan.
The system's calibration consists of two stages: pre-calibration based on Zhang's method [30] and stereo camera calibration based on linear least square technique [31]. Pre-calibration was performed using images captured at a specific distance from system to remove distortion from the images at different distances. The undistorted image data can be obtained using Equation (1) from [32]: where (x p , y p ) and (x d , y d ) are the corrected and distorted pixel coordinates, respectively. For two view 3D reconstruction, the fundamental matrix was estimated from Equation (2) using random sample consensus (RANSAC) algorithm [33]. Binary coded structured light was used to obtain the coded images of the stereo camera and NCC was utilized for stereo matching [25] in this research. Binary patterns were projected on the 3D projected and captured by the proposed hardware. The procedure for generation of binary coded images from the projected binary patterns for the handheld profiling system is as follows: (1) Nine images for each camera are loaded, which consist of one all-white, one all-black, and seven binary images; (2) The average of all-white and all-black images is determined and compared with the binary images; (3) Each pixel in the binary images is examined to determine whether it is illuminated or not by thresholding; (4) The code from all binary images is concatenated into a binary coded image: Epipolar geometry can be defined as the basic geometry of the stereo camera which describes the relationship between the image coordinates of the stereo pair. Some facts about the epipolar geometry [32] are listed as follows: (1)  is termed as epipolar constraint; (3) Epipolar constraint converts the two dimensional search for stereo matching into one dimensional along the epipolar lines provided that the epipolar geometry of the stereo camera is known; (4) Therefore, epipolar constraint reduces the computation expenses of the stereo matching and excludes the features that may result in false matches; (5) For the two feature points visible in the field of view of both cameras appearing in a specific order in the left image, the correspondences of these points in the right image will occur in the same order as of the left image.
The binary coded images were used in the stereo matching based on NCC and the whole images are processed to render the 3D point cloud for a single scan. Since this matching process consumes huge time, the region of interest (ROI) of the stereo pairs enclosing the 3D object was used for yielding the 3D data, which improves the computation time. We demonstrated the epipolar geometry between the binary coded images in the stereo pair shown in Figure 2. Figure 2b,c depict binary coded images of the left camera with the point indicated by black circle and the coded image of the right camera shows the epipolar line passing through the matched point between the stereo pair; whereas the actual skull object used for the 3D reconstruction is depicted in Figure 2a. image, the matched feature must be located along the corresponding epipolar line and this constraint is termed as epipolar constraint; (3) Epipolar constraint converts the two dimensional search for stereo matching into one dimensional along the epipolar lines provided that the epipolar geometry of the stereo camera is known; (4) Therefore, epipolar constraint reduces the computation expenses of the stereo matching and excludes the features that may result in false matches; (5) For the two feature points visible in the field of view of both cameras appearing in a specific order in the left image, the correspondences of these points in the right image will occur in the same order as of the left image. The binary coded images were used in the stereo matching based on NCC and the whole images are processed to render the 3D point cloud for a single scan. Since this matching process consumes huge time, the region of interest (ROI) of the stereo pairs enclosing the 3D object was used for yielding the 3D data, which improves the computation time. We demonstrated the epipolar geometry between the binary coded images in the stereo pair shown in Figure 2. Figures 2b,c depict binary coded images of the left camera with the point indicated by black circle and the coded image of the right camera shows the epipolar line passing through the matched point between the stereo pair; whereas the actual skull object used for the 3D reconstruction is depicted in Figure 2a.

Multiview 3D Reconstruction
Multi-view 3D reconstruction of the point cloud data consists of two stages: rough registration and fine registration based on the ICP algorithm. Rough registration is based on camera parameters estimated using visual navigation. For large and unstable handheld motion, a coarse-to-fine strategy is employed in multiview 3D reconstruction.
The visual navigation algorithm determines the relative rotation and translation parameters of a single moving camera using RANSAC-based homography estimation [32,33]. The projective mapping between the two images or planes is known as homography. The relationship between the image points in the source and destination images is expressed by a homography matrix 'H'. The Direct Linear Transform (DLT) algorithm can be used to estimate the matrix 'H' using sufficient number of matched features [34] as given below: where: 11

Multiview 3D Reconstruction
Multi-view 3D reconstruction of the point cloud data consists of two stages: rough registration and fine registration based on the ICP algorithm. Rough registration is based on camera parameters estimated using visual navigation. For large and unstable handheld motion, a coarse-to-fine strategy is employed in multiview 3D reconstruction.
The visual navigation algorithm determines the relative rotation and translation parameters of a single moving camera using RANSAC-based homography estimation [32,33]. The projective mapping between the two images or planes is known as homography. The relationship between the image points in the source and destination images is expressed by a homography matrix 'H'. The Direct Linear Transform (DLT) algorithm can be used to estimate the matrix 'H' using sufficient number of matched features [34] as given below: where: After dividing the first row and the second row of Equation (3) by the third row, we get the following two equations: Equations (4) and (5) can be written in matrix form as follows: The pose and position of a single moving camera may be determined via homography decomposition provided that the intrinsic camera matrix is known. The equations for the homography estimation and decomposition are as follows: where (u src , v src ) are the pixel coordinates of the source image; (u dst , v dst ) are the pixel coordinates of the destination image; H = [h 1 h 2 h 3 ] is a 3 × 3 homography matrix; r i is the i-th column of the 3 × 3 rotation matrix; h i is a 3 × 1 vector, i = 1 to 3; λ is the scaling factor; and M is the intrinsic matrix of the camera (known by calibration). The rough registration is based on the transformation of the point clouds into the coordinate of the reference view using the parameters from visual navigation. The mathematical equation is given below: where X i is the 3D point cloud of the i-th view, R i is the relative rotation between the i-th view and the reference view point cloud, T i is the relative translation between the i-th view and the reference view point cloud, and X i ref is the i-th point cloud transformed into the reference view coordinate.
The points clouds were further refined using the ICP algorithm, which is a modified version of that presented in [16]. The ICP algorithm consists of an extrapolation step that traces out a path in the registration state space from the identity transformation toward a locally optimal shape match [15]. The extrapolation step results in reducing the number of iterations for fast convergence of the ICP algorithm. The mathematics of the proposed algorithm is similar to the algorithm presented in [16], but a number of modifications have been made. The ICP algorithm presented in [16] is based on minimizing the distance measure function derived from the definition of the 3D surface registration. This registration algorithm based on Levenberg Marquardt (LM) algorithm to solve least-square equations, is computationally expensive. To solve this problem, we did not employ the LM algorithm and an extrapolation step [15] has been added to further accelerate the proposed ICP algorithm. The proposed algorithm also consists of worst 10% of pairs-based outlier rejection method [35]. The algorithm in [15] needs a 3D model and a sensed model (data model) for the 3D registration, whereas the proposed algorithm does not need the 3D model of an object. The block diagram of ICP algorithm is depicted in Figure 3 using the reference view point cloud and the other viewpoint cloud.
Sensors 2018, 18, x 7 of 16 [15]. The extrapolation step results in reducing the number of iterations for fast convergence of the ICP algorithm. The mathematics of the proposed algorithm is similar to the algorithm presented in [16], but a number of modifications have been made. The ICP algorithm presented in [16] is based on minimizing the distance measure function derived from the definition of the 3D surface registration. This registration algorithm based on Levenberg Marquardt (LM) algorithm to solve least-square equations, is computationally expensive. To solve this problem, we did not employ the LM algorithm and an extrapolation step [15] has been added to further accelerate the proposed ICP algorithm. The proposed algorithm also consists of worst 10% of pairs-based outlier rejection method [35]. The algorithm in [15] needs a 3D model and a sensed model (data model) for the 3D registration, whereas the proposed algorithm does not need the 3D model of an object. The block diagram of ICP algorithm is depicted in Figure 3 using the reference view point cloud and the other viewpoint cloud. The final refinement stage based on the ICP algorithm yields good accuracy if the initial estimation of pose for coarse registration is also accurate. If the initial pose has good accuracy, the refinement stage starts registering the 3D point clouds again following a coarse-to-fine strategy and yielding high registration accuracy. The block diagram of the algorithm for the formation of 3D mesh is shown in Figure 4.  The final refinement stage based on the ICP algorithm yields good accuracy if the initial estimation of pose for coarse registration is also accurate. If the initial pose has good accuracy, the refinement stage starts registering the 3D point clouds again following a coarse-to-fine strategy and yielding high registration accuracy. The block diagram of the algorithm for the formation of 3D mesh is shown in Figure 4. [15]. The extrapolation step results in reducing the number of iterations for fast convergence of the ICP algorithm. The mathematics of the proposed algorithm is similar to the algorithm presented in [16], but a number of modifications have been made. The ICP algorithm presented in [16] is based on minimizing the distance measure function derived from the definition of the 3D surface registration. This registration algorithm based on Levenberg Marquardt (LM) algorithm to solve least-square equations, is computationally expensive. To solve this problem, we did not employ the LM algorithm and an extrapolation step [15] has been added to further accelerate the proposed ICP algorithm. The proposed algorithm also consists of worst 10% of pairs-based outlier rejection method [35]. The algorithm in [15] needs a 3D model and a sensed model (data model) for the 3D registration, whereas the proposed algorithm does not need the 3D model of an object. The block diagram of ICP algorithm is depicted in Figure 3 using the reference view point cloud and the other viewpoint cloud. The final refinement stage based on the ICP algorithm yields good accuracy if the initial estimation of pose for coarse registration is also accurate. If the initial pose has good accuracy, the refinement stage starts registering the 3D point clouds again following a coarse-to-fine strategy and yielding high registration accuracy. The block diagram of the algorithm for the formation of 3D mesh is shown in Figure 4.

Experiments and Results
This section describes the experiments and results of this research. Two objects were selected for the 3D reconstruction and were placed at 50 cm from the handheld profiling system. These objects included a skull, which was reconstructed as the qualitative demonstration of the 3D modeling, and a box, which was used to quantitatively analyze the 3D reconstruction results. Our previous research [25] has described the details of the experimental setup and has also mentioned the working distances and illumination patterns. That description is also applicable to this handheld scanning research. The studies in [25] utilized a zoom lens while the current research did not employ any zoom lens. The current research is based on multiview geometry while the study in [25] was based on single view geometry.
First, the 3D reconstruction of the skull object, shown in Figure 5a, was carried out and the raw 3D point cloud was further processed using the Geomagic Control software (3D Systems, Inc., Rock Hill, SC, USA). The results of the 3D reconstruction of the skull before and after the post-processing of the point cloud are shown in Figure 5b,c. A single view 3D scan of the skull shows good quality of the point cloud with the preservation of features, the shape of the 3D object is also visible as depicted in Figure 5c and also in Figure 9a below, which shows the result of the mesh of another single view 3D point cloud of the skull.

Experiments and Results
This section describes the experiments and results of this research. Two objects were selected for the 3D reconstruction and were placed at 50 cm from the handheld profiling system. These objects included a skull, which was reconstructed as the qualitative demonstration of the 3D modeling, and a box, which was used to quantitatively analyze the 3D reconstruction results. Our previous research [25] has described the details of the experimental setup and has also mentioned the working distances and illumination patterns. That description is also applicable to this handheld scanning research. The studies in [25] utilized a zoom lens while the current research did not employ any zoom lens. The current research is based on multiview geometry while the study in [25] was based on single view geometry.
First, the 3D reconstruction of the skull object, shown in Figure 5a, was carried out and the raw 3D point cloud was further processed using the Geomagic Control software (3D Systems, Inc., Rock Hill, SC, USA). The results of the 3D reconstruction of the skull before and after the post-processing of the point cloud are shown in Figures 5b,c. A single view 3D scan of the skull shows good quality of the point cloud with the preservation of features, the shape of the 3D object is also visible as depicted in Figure 5c and also in Figure 9a below, which shows the result of the mesh of another single view 3D point cloud of the skull. We also performed the experiment using the box object, shown in Figure 5d, to evaluate the accuracy of a single view 3D reconstruction by measuring the height and length of the same object as the accuracy in a single view 3D reconstruction directly corresponds to the accuracy of the 3D modeling. Therefore, the paper box was placed at 50 cm from handheld profiling system. Figures 5e,f show the 3D reconstruction result of the box before and after post-processing of the point cloud. We also performed the experiment using the box object, shown in Figure 5d, to evaluate the accuracy of a single view 3D reconstruction by measuring the height and length of the same object as the accuracy in a single view 3D reconstruction directly corresponds to the accuracy of the 3D modeling. Therefore, the paper box was placed at 50 cm from handheld profiling system. Figure 5e,f show the 3D reconstruction result of the box before and after post-processing of the point cloud. Table 1 shows the accuracy of the dimensions of the paper box object i.e., the mean errors in height and length of the box were found to be 9.4 µm and 23 µm, respectively, which demonstrate the good accuracy of the single view 3D reconstruction. Mean measured value is the mean of a specific number of manual measurements of the dimension of the paper box in Geomagic Verify viewer. Mean error is the difference between the original value and the mean measured value. This procedure is to show the quantitative evaluation of the accuracy of the single view 3D reconstruction and it is not related to ICP.  Figure 6 also shows the quality of the 3D reconstruction of the box object in Figure 6a. The result of the visualization of the point cloud of the box object for another single view is shown in Figure 6b, whereas Figure 6c depicts the refined mesh of the same view of the box object.
Sensors 2018, 18, x 9 of 16 Table 1 shows the accuracy of the dimensions of the paper box object i.e., the mean errors in height and length of the box were found to be 9.4 μm and 23 μm, respectively, which demonstrate the good accuracy of the single view 3D reconstruction. Mean measured value is the mean of a specific number of manual measurements of the dimension of the paper box in Geomagic Verify viewer. Mean error is the difference between the original value and the mean measured value. This procedure is to show the quantitative evaluation of the accuracy of the single view 3D reconstruction and it is not related to ICP.  Figure 6 also shows the quality of the 3D reconstruction of the box object in Figure 6a. The result of the visualization of the point cloud of the box object for another single view is shown in Figure 6b, whereas Figure 6c depicts the refined mesh of the same view of the box object. For the registration of the point clouds, the two view 3D reconstructions were performed using the skull for different views performing a handheld motion. A visualization of the two-point clouds of skull before and after applying the ICP algorithm shown in green and blue colors is depicted in Figure 7 for three pairs of roughly registered point clouds, which demonstrates final refinement using the ICP algorithm and further enhancement of the shape of the skull. Each pair consists of a reference view point cloud and a roughly registered view with respect to reference view via coarse registration stage. The root mean square error (RMS) for the three pairs of point cloud registered using the ICP algorithm is shown in Table 2; the table also shows the average of number of the 3D points of the two point clouds in each pair. The accuracy for the ICP algorithm-based final refinement in RMS is found to be less than 1 mm. For the registration of the point clouds, the two view 3D reconstructions were performed using the skull for different views performing a handheld motion. A visualization of the two-point clouds of skull before and after applying the ICP algorithm shown in green and blue colors is depicted in Figure 7 for three pairs of roughly registered point clouds, which demonstrates final refinement using the ICP algorithm and further enhancement of the shape of the skull. Each pair consists of a reference view point cloud and a roughly registered view with respect to reference view via coarse registration stage. The root mean square error (RMS) for the three pairs of point cloud registered using the ICP algorithm is shown in Table 2; the table also shows the average of number of the 3D points of the two point clouds in each pair. The accuracy for the ICP algorithm-based final refinement in RMS is found to be less than 1 mm.
Sensors 2018, 18, x 9 of 16 Table 1 shows the accuracy of the dimensions of the paper box object i.e., the mean errors in height and length of the box were found to be 9.4 μm and 23 μm, respectively, which demonstrate the good accuracy of the single view 3D reconstruction. Mean measured value is the mean of a specific number of manual measurements of the dimension of the paper box in Geomagic Verify viewer. Mean error is the difference between the original value and the mean measured value. This procedure is to show the quantitative evaluation of the accuracy of the single view 3D reconstruction and it is not related to ICP.  Figure 6 also shows the quality of the 3D reconstruction of the box object in Figure 6a. The result of the visualization of the point cloud of the box object for another single view is shown in Figure 6b, whereas Figure 6c depicts the refined mesh of the same view of the box object. For the registration of the point clouds, the two view 3D reconstructions were performed using the skull for different views performing a handheld motion. A visualization of the two-point clouds of skull before and after applying the ICP algorithm shown in green and blue colors is depicted in Figure 7 for three pairs of roughly registered point clouds, which demonstrates final refinement using the ICP algorithm and further enhancement of the shape of the skull. Each pair consists of a reference view point cloud and a roughly registered view with respect to reference view via coarse registration stage. The root mean square error (RMS) for the three pairs of point cloud registered using the ICP algorithm is shown in Table 2; the table also shows the average of number of the 3D points of the two point clouds in each pair. The accuracy for the ICP algorithm-based final refinement in RMS is found to be less than 1 mm.  In order to evaluate the ICP algorithm quantitatively and compare it with other variants of ICP, we followed some of the procedures related to [35,36]. We compared the proposed ICP algorithm with other variants using the 3D point clouds produced by the 3D handheld scanning system based on multiple shot structured light and analyzed the accuracy, convergence behavior, speed and the robustness of the algorithms. Figure 8a shows the convergence behavior of the ICP variants for outlier rejection schemes i.e., worst 10% of pairs (proposed one), edge rejection and no outlier rejection. The graph shows that the edge rejection outperforms the other schemes while worst pair rejection performs close to the edge rejection scheme. Five matching strategies, i.e., brute force matching, K-D tree matching, K-D tree and extrapolation (proposed one), Delaunay matching and Levenberg Marquardt (LM) with K-D tree, were compared using the 3D data produced by the proposed system in Figure 8b for convergence behavior. The results show that LM algorithm with K-D tree performed better than the other strategies, while the convergence behavior of K-D tree with extrapolation was close to the LM (K-D tree) and other variants. The overshoot observed in case of the K-D tree with extrapolation is mainly due to extrapolation step [35]. Five error metrics, point to point, point to plane, point to point with extrapolation, point to plane with extrapolation (proposed one) and point to point using LM algorithm, were comparatively analyzed in Figure 8c. The results demonstrate that the convergence behavior of the extrapolated point to point and point to plane error metrics are the same or better than those of the other metrics, while the point to point and the point to point with LM algorithm performed better than the point to plane metrics. The convergence behaviour of the point to point with LM and the point to point metric is same as the point clouds have good overlap.  In order to evaluate the ICP algorithm quantitatively and compare it with other variants of ICP, we followed some of the procedures related to [35,36]. We compared the proposed ICP algorithm with other variants using the 3D point clouds produced by the 3D handheld scanning system based on multiple shot structured light and analyzed the accuracy, convergence behavior, speed and the robustness of the algorithms. Figure 8a shows the convergence behavior of the ICP variants for outlier rejection schemes i.e., worst 10% of pairs (proposed one), edge rejection and no outlier rejection. The graph shows that the edge rejection outperforms the other schemes while worst pair rejection performs close to the edge rejection scheme. Five matching strategies, i.e., brute force matching, K-D tree matching, K-D tree and extrapolation (proposed one), Delaunay matching and Levenberg Marquardt (LM) with K-D tree, were compared using the 3D data produced by the proposed system in Figure 8b for convergence behavior. The results show that LM algorithm with K-D tree performed better than the other strategies, while the convergence behavior of K-D tree with extrapolation was close to the LM (K-D tree) and other variants. The overshoot observed in case of the K-D tree with extrapolation is mainly due to extrapolation step [35]. Five error metrics, point to point, point to plane, point to point with extrapolation, point to plane with extrapolation (proposed one) and point to point using LM algorithm, were comparatively analyzed in Figure 8c In order to evaluate the speed of the proposed algorithm with other ICP variants, matching strategies, brute force matching, K-D tree matching, K-D tree and extrapolation (proposed one), Delaunay matching and Levenberg Marquardt (LM) with K-D tree, were compared using the two 3D point clouds of each of 40-42 k data points shown in Figure 8d. The graph concludes that the brute force matching is the most computationally expensive matching scheme and the performance of the K-D tree based matching schemes is similar. Among the K-D tree-based schemes, K-D tree with extrapolation outperforms the other matching strategies. Since the point clouds have good overlap, K-D tree with LM algorithm performs similar to the K-D tree matching. In order to evaluate the accuracy of the proposed algorithm and ICP variants, we fixed the Handheld scanner on a rotational In order to evaluate the speed of the proposed algorithm with other ICP variants, matching strategies, brute force matching, K-D tree matching, K-D tree and extrapolation (proposed one), Delaunay matching and Levenberg Marquardt (LM) with K-D tree, were compared using the two 3D point clouds of each of 40-42 k data points shown in Figure 8d. The graph concludes that the brute force matching is the most computationally expensive matching scheme and the performance of the K-D tree based matching schemes is similar. Among the K-D tree-based schemes, K-D tree with extrapolation outperforms the other matching strategies. Since the point clouds have good overlap, K-D tree with LM algorithm performs similar to the K-D tree matching. In order to evaluate the accuracy of the proposed algorithm and ICP variants, we fixed the Handheld scanner on a rotational stage and acquired the 3D point clouds at five angles with 2 degrees difference between the consecutive point clouds. After applying the rough registration, we applied the different strategies for ICP, i.e., point to point, point to plane, point to plane with 10% worst rejection (proposed one), point to point with edge rejection, point to plane with edge rejection and point to point with LM algorithm (with edge rejection). The results were recorded in Table 3 and the best angle measurements are shown in bold. In case of the point to point based strategies, the point to point based angle measurements were improved using edge rejection and LM algorithm. While the point to plane with edge rejection performs better than the point to plane and point to plane with 10% worst pair rejection. Finally, the point to plane with edge rejection based angle measurements are more accurate than those with point to point with edge rejection. To evaluate the robustness of the proposed algorithm, we performed the acquisition of the point clouds at 8 to −8 degree with the step size of 2 and the point cloud at 0 degree is taken as the reference point cloud for all the other point clouds [36,37]. We compared error metrics with rejection strategy, i.e., point to point, point to plane, point to plane with 10% worst rejection (proposed one), point to point with edge rejection, point to plane with edge rejection, and point to point with LM and edge rejection as shown in Figure 8e. The results show that the point to plane with 10% worst rejection outperforms the other error metrics and follows a symmetry on either side of 'zero' degree position. The point to point, the point to point with LM and point to plane error metrics with edge rejection performed better than the point to point and point to plane error metrics without edge rejection on the point clouds at positive angles. The 3D mesh for one view and the integration of five views after applying the ICP algorithm are shown in Figure 9a-c. Figure 9a shows the visualization of a single view mesh depicting the good quality of the single view 3D reconstruction, whereas the mesh of the integration of five views before and after refinement using the MeshLab software (University of Pisa, Italy) [38], is depicted in Figure 9b,c. The results of the 3D mesh generation shown in Figure 9a,b, demonstrate the difference between the mesh of the single view point cloud and mesh of the integration of five point clouds in term of holes. The holes in Figure 9a have been compensated as shown in Figure 9b using the proposed multiview 3D reconstruction. In order to find the surface divergence between the merged point clouds (five point clouds) and the 3D model of the skull phantom, we generated the 3D replica of the skull phantom using a 3D scanner (DAVID SLS-3; Hewlett-Packard, Palo Alto, CA, USA) having an accuracy of 50 µm and we performed the comparison between the merged point clouds with the 3D model using CloudCompare software [39]. The 3D Scanner used as a reference platform is the structured light 3D scanner (DAVID SLS-3). The specifications of this scanner are as follows:

Conclusions
In this paper, we have implemented a 3D handheld profiling system based on multiview stereo vision and multiple shot structured light. The system consists of the handheld profiling system using a stereo camera and a non-calibrated projector. Single view 3D reconstruction approach based on binary coded structured light and NCC was utilized to get the point clouds of different views.
A rough registration of multiple point clouds was performed using the relative orientation and translation parameters estimated via homography-based visual navigation. The registered point clouds were further refined using the ICP algorithm. The system was tested using an artificial human skull and a paper box object to demonstrate the qualitative and quantitative analysis of the 3D reconstruction. For the quantitative evaluation of the proposed system, a paper box was reconstructed and errors in its height and breadth were found to be 9.4 µm and 23 µm, respectively. A comprehensive quantitative evaluation and comparision of the proposed algorithm was performed with other variants of ICP. The proposed ICP algorithm was found to be comparable to the other variants of ICP. The mean distance between the merged point clouds and the 3D model was 0.94 mm while the standard deviation was found to be 0.15 mm. Future research directions include the modelling of human body parts and the utilization of a single shot binary pattern to reduce the processing time. The processing time can be further reduced using parallel processing.