High-Accuracy Globally Consistent Surface Reconstruction Using Fringe Projection Profilometry

This paper presents a high-accuracy method for globally consistent surface reconstruction using a single fringe projection profilometry (FPP) sensor. To solve the accumulated sensor pose estimation error problem encountered in a long scanning trajectory, we first present a novel 3D registration method which fuses both dense geometric and curvature consistency constraints to improve the accuracy of relative sensor pose estimation. Then we perform global sensor pose optimization by modeling the surface consistency information as a pre-computed covariance matrix and formulating the multi-view point cloud registration problem in a pose graph optimization framework. Experiments on reconstructing a 1300 mm × 400 mm workpiece with a FPP sensor is performed, verifying that our method can substantially reduce the accumulated error and achieve industrial-level surface model reconstruction without any external positional assistance but only using a single FPP sensor.


Introduction
Fringe projection profilometry provides a convenient way to measure dense and accurate three dimensional (3D) surface point cloud of target objects. It plays an increasingly important role in various fields such as industrial quality inspection, prototyping, culture heritage preservation and movie industry [1][2][3][4][5]. Owing to the limited field of view (FOV) and object self-occlusion, 3D point cloud obtained from a single viewpoint only contains partial surface shape data. To reconstruct complete surface models, 3D measurements from multiple viewpoints are deserved to cover the whole object, and their sensor poses need to be precisely tracked to further transform these partial surface point clouds into a global coordinate system [6][7][8][9].
Existing sensor pose tracking solutions are mostly based on external assistance methods, such as attaching artificial markers or using external positional equipment such as a laser tracker or optical coordinate measuring machines (CMMs) [10], their usage flexibility is inherently limited. Alternatively, sensor poses can also be directly estimated by using 3D registration techniques [11][12][13] to compute the relative pose between sequential two measurements. However, sensor pose estimation drifts inevitably exist due to 3D registration inaccuracy. Small sensor pose estimation error which may seem negligible on a local scale, can drastically accumulate along a long scanning trajectory [12,14]. The accumulated error directly leads to surface point clouds inconsistency between the first and last scans and finally breaks the reconstruction result.

Measurement Principle
In our FPP sensor, a series of sinusoidal fringes along the horizontal axes of projector image frame with constant phase shifting are projected onto a target object, and two cameras capture the distorted fringe images synchronously. The captured images can be expressed as: where (x, y) is the pixel coordinates and is omitted in the following expression, I i denotes the recorded intensity, A i indicates the average intensity, B i represents the modulation intensity, δ i is the constant phase-shift, n is the phase shift number, and φ is the desired phase information. By solving Equation (1), the phase value φ can be obtained according to: The arctangent function in Equation (2) will result in a phase value within the range of [−π, π] with 2π discontinuities. In our sensor, multi-frequency heterodyne technology is adopted to construct the continuous phase map [21], so that the correspondence between two camera views can be established unambiguously. Finally, the 3D result can be obtained according to the pre-calibrated camera intrinsic and external parameters. The measurement principle of the FPP sensor is shown in the Figure 1 below.

Relative Sensor Pose Estimation
The relative sensor pose estimation between sequential two measurements (also called as frames in the following) is the basis to obtain the initial global sensor pose estimation of each measurement. In this section, we will introduce the proposed method which estimates the relative sensor pose (a rigid transformation) by 3D registering two depth maps to jointly optimize the dense geometric and curvature inconsistency errors. The whole process is conducted by first computing the curvature map of each depth map, and then iteratively performing data association and error minimization steps.

Curvature Map Estimation
Similarly to depth map (also called as depth image), curvature map is a 2D image in which the value of each pixel is the surface curvature value instead of the depth value. Specifically, for each pixel x = (u, v) in the depth map with valid depth z(x), its corresponding 3D point coordinate p(x) can be computed using the inverse of projection function Π(·) as: where f x , f y are the focal lengths and c x , c y are the principle point, respectively. The mean curvature of each point on the surface is represented using a surface variation notion in [22]. Hence, the surface curvature value κ(x) at pixel x is estimated by taking the eigen-analysis of the covariance matrix of the local neighbor points of point p(x). The covariance matrix is defined as: where p i is one of the nearest neighbor points of p(x). Then κ(x) can be computed as: where λ 0 ≤ λ 1 ≤ λ 2 are the eigenvalues of the covariance matrix C(x).
To speed up the nearest neighbor search, we take advantage of the organized point cloud structure embedded in the depth map, only taking adjacent pixels as candidate neighbors. Meanwhile, the geometric continuity constraints are also considered to filter the potential depth gaps by specifying a maximum allowed distance. Pixel x i is the nearest neighbor of pixel x, only when it satisfies x − x i ≤ σ 1 , and p(x) − p(x i ) ≤ σ 2 , where σ 1 and σ 2 represent the pixel and point nearest neighbor distance threshold, respectively. In this paper, we set σ 1 = 3 and σ 2 = 1.1 mm (with average point cloud density as 0.275 mm) to allow approximate 30 nearest neighbor points for curvature value estimation. Figure 2a shows a depth map measured with the FPP sensor, Figure 2b shows the estimated curvature map using our method. Figure 2c is the corresponding 3D point cloud whose color is mapped from the curvature map, and the local detail is displayed in Figure 2d. It can be seen that the estimated curvature map exhibits high consistency with the point cloud surface variation. Furthermore, by carefully handling the discontinuous boundary case, the curvature values at boundary points can also be robustly estimated, as shown in Figure 2d.

Data Association
Data association is to identify the corresponding points between two sequential frames, the correspondence set is then fed to the optimization process to find the optimal relative sensor pose estimation. Assuming small camera motion between sequential frames, the projective data association algorithm [12] is conducted to produce the point correspondences set. Given the relative sensor pose estimation T i−1,i between current frame f i and its previous frame f i−1 , then for each pixel x with valid depth in f i , we first transform its corresponding 3D point p(x) into the local coordinate system of previous frame f i−1 as T i−1,i p(x) = (x, y, z) . Then the corresponding pixel of x in frame f i−1 can be computed with perspective projection: where K is the camera intrinsic matrix. Note that for simplicity of notation, we omit the conversions between vectors and its homogeneous vectors throughout this paper.
With the projective data association, multiple pixels in source depth image f i may correspond to a common pixel in target depth image f i−1 . To solve the many-to-one problem, the z-buffer technique is adopted, for each pixel in target depth map f i−1 we only keep the corresponding pixel in source depth map f i with minimum depth. All corresponding points pairs together construct the corresponding set

Minimization
The relative sensor pose optimization function E reg is defined as: where E geo denotes the geometric inconsistency error, E cur denotes the curvature inconsistency error, λ is the weight of the curvature inconsistency error.
The geometric error is defined as the point-to-plane error [11] between current and previous frames: in which (x,x) is one corresponding pixels pair in the corresponding set K i−1,i , p i (x) is the local 3D point in the current frame f i , p i−1 (x) and n i−1 (x) are the corresponding 3D point and normal, respectively. T i−1,i is the current estimation of the relative sensor pose between the two frames. exp(ξ) ∈ SE(3) is the incremental transformation to be estimated in each iteration, in which ξ = (ω, t) = (α, β, γ, t x , t y , t z ) ∈ R 6 . The curvature inconsistency error E cur is defined as the curvature value inconsistency between the warped curvature map of current frame f i and the curvature map of previous frame f i−1 : where κ i (x) is the curvature value at pixel x of the current frame, κ i−1 (x) is the curvature value at pixel x of the previous frame. Assuming the incremental pose transformation exp(ξ) to optimize at each iteration is small, it can be linearized as exp(ξ) ≈ I +ξ, whereξ ∈ se(3) is the corresponding Lie algebra element: (3) is a linear skew-symmetric operator (see [23] for details).
With this linearization and simple notationṗ i−1 (x) = T i−1,i p i (x), the error term E geo becomes: where J geo is the Jacobian matrix and r geo is the residual vector. Similarly, the error term E cur becomes: With the above linearization, minimization of Equation (7) allows to solve the following linear system: In each iteration, we compute Jacobian J geo , J cur and residual r geo , r cur at current relative sensor pose estimation T i−1,i , and solve the linear system in Equation (13) to find the ξ that best satisfies the geometric and curvature consistency constraint. Then the relative pose T i−1,i is updated to exp(ξ)T i−1,i , and taken as the initialization for the next iteration.
When the optimization converges, the T i−1,i is taken as the final relative sensor pose estimation between two frames. We fix the sensor pose of the first frame f 1 as T 1 = I and regard it as the world coordinate system. Then the initial global sensor pose of frame f i is computed as Figure 3 shows the 3D registration results comparison between the proposed method and two other methods. The sensor pose estimation accuracy is directly reflected in the surface shape consistency of two registered point clouds. When independently visual inspecting each registration result, each method seems to converge to a correct result. However, when comparing the registration results between Figure 3b-d, it is not hard to see that the relative sensor pose estimation accuracy of our method outperforms the other two methods.  (8), (c) Point-to-plane ICP performed on 3D point cloud with a max distance threshold to eliminate outliers, (d) Minimizing both of the geometric error and curvature error as proposed in this paper. Figure 4a,b represents the curvature value difference map between source and target point cloud before and after the 3D registration, respectively. The curvature difference map is built on the target frame f i−1 , correspondences are built using the above data association method. Gray pixels indicate that no correspondence is built for these pixels. It can be seen that the curvature value difference from Figure 4a,b decreases dramatically over the whole map, which demonstrates the significance of introducing curvature map consistency into the 3D registration constraints.

Global Sensor Pose Optimization
Though fusing curvature consistency information improves the accuracy of the estimated relative sensor poses, the global sensor pose drift will inevitably accumulate during a long scanning process. To reduce the accumulated error and obtain globally consistent 3D models, successful relative pose estimation to much earlier frames (also called as building loop closure) is deserved. In this section, we will first introduce how to automatically build a series of loop closures with the proposed adaptive keyframe selection and the two-step checking method. We will then introduce our method which performs multi-view point cloud registration in a pose graph optimization framework [19].

Keyframe Selection
Detecting loop closure for every new-income measurement is not optimal; it will greatly increase the computation cost after a long time scanning. Therefore, we only detect loop closure for selected keyframes. We utilize 6-DOF (degree of freedom) pose distance metrics to determine when to add a new keyframe for further loop closure detection. For each new input frame f j , we evaluate the relative pose distances between it and the last added keyframe f k i−1 . In which, the rotation distance is measured as the rotation angle using the Rodrigues' formula: The translation distance is computed as: If either the rotation or translation distance exceeds its corresponding threshold σ R or σ t , the current frame f j is marked as a new keyframe f k i . We set σ R = 20 • , σ t = 130 mm in our paper. Figure 5 shows the keyframe selection results using the total 146 depth maps acquired with our FPP sensor (see Section 5). Gray points identify the 34 selected keyframes out of a total 146 depth maps.

Loop Closure Detection
For each new added keyframe f k i , we use a two-step checking scheme to detect whether it forms correct loop closures with previous keyframes. If two keyframes construct a loop closure, then they must fulfil: (1) the overlapping area between two point clouds is enough, (2) the mean absolute error (MAE) between them is small.
The overlapping area ratio is crucial for arbitrary two frames with loop closures, as small overlapping area ratios are prone to correspond to non-loop-closure connection. In this paper, we propose to use the projective association algorithm to efficiently compute the overlapping area ratio between two keyframes. When a new keyframe f k i arrives, we compute its depth valid map V k i for each pixel. V k i (x) = 1 for each pixel where its depth is valid, and V i (x) = 0 when depth is not valid. Then for a pair of keyframes f k i and f k j , we obtain the correspondence set K k i,j = {(x,x)} using the data association method in Section 3.2. Note that, the relative sensor pose between f k i and f k j is computed as T k i,j = T k i −1 T k j here. A correspondence pair (x,x) is identified as overlapped when V k j (x) = 1. We collect all overlapped point pairs, the overlapping ratio is computed as τ o = N/M, where N is the overlapped points number, M is the total number of points with valid depth. If the overlapping ratio τ o is larger than the threshold σ o , we mark keyframe f k i and f k j as a candidate loop closure. Figure 6a shows the overlapping ratios between the 34th keyframe (frame 145) with all its previous keyframes, we set the overlapping ratio threshold σ o = 0.65 in this paper. We select frame 36, 96, 120 and 140 to visualize the correctness of our proposed method as shown in Figure 6b, dotted line sketches the scanning path.
We then check the dense geometric consistency to further validate the correctness of these candidate loop closures. A candidate loop closure ( f k i , f k j ) is considered as reliable only if the MAE of the correspondence points between two frames is below a threshold σ r : If the two-step checks all passed, the two frames are further registered together to construct a loop closure.

Graph Based Sensor Pose Optimization
Removing the accumulated error to get globally consistent model needs to eliminate the surface inconsistencies across all associated point clouds. We define the surface inconsistency as a error term F i,j in terms of the dense geometric registration error between frame f i and f j , as: Note that T i , T j is obtained through the relative sensor pose estimation in Section 3.3, T j,i is obtained through the loop closure detection in Section 4.2. Inconsistency exists between T j,i and T i , T j due to the accumulated error. Line (17) holds by restricting the corresponding points (p i , p j ) must fulfil T j,i p i − p j < , we set = 1.0 mm in this paper.
Then by approximating T −1 i T j T j,i = I +ξ i,j , Equation (17) can be written as: in which ξ i,j actually measures the inconsistency between sensor pose T i , T j and their relative pose Equation (19) shows the surface inconsistency term F i,j can be represented with the sensor pose inconsistency term ξ i,j and a covariance matrix Ω i,j = ∑ p i ,p j G i G i , it is constant and can be pre-computed for each term during the 3D registration process.
Let C be the set of indices for which a connection between two sensor poses exists, then the multi-view point cloud registration problem can be formulated as: This exactly defines a pose graph optimization, which can be directly solved using the g2o library [19]. Figure 7 shows the pose graph constructed with our method. Vertices represent the 6-DoF sensor poses, edges represent the constraints between sensor poses. The pose graph is visualized with the g2o viewer software.

Experiment
In the experiment, a FPP sensor is constructed using (1) a Texas Instruments LighterCrafter4500 board (Texas Instruments, Dallas, TX, USA) for fringe patterns projection, (2) two Basler acA1300-30gm cameras (Basler AG, Ahrensburg, Germany) simultaneously capturing the modulated images with pixel resolution of 1296 × 966. The proposed method is validated by scanning a 1300 mm × 400 mm sheet metal using the FPP sensor as shown in Figure 8, the 3D measurement and model reconstruction are conducted on a desktop PC with a 3.3 GHz Intel Xeon CPU and 16 GB RAM. By moving the FPP sensor around, complete scan of the sheet metal with totally 146 frames (depth maps) acquired is accomplished. To test and verify the accuracy and effectiveness of the proposed relative sensor pose estimation method and the global optimization method, a ceramic ball bar is placed beside the measured sheet metal. The reconstruction accuracy can then be well examined by qualitatively observing the surface consistency and quantitatively analyzing the size fitting results of the reconstructed ceramic ball bar.

Relative Sensor Pose Estimation Accuracy
The accuracy of our proposed relative sensor pose estimation method is tested first. The sensor pose of each frame relative to the world coordinate system (frame 1) is separately estimated by (1) jointly optimizing the geometric and curvature consistency constraints (our method), (2) only optimizing the geometric consistency constraint for comparison. With the estimated sensor poses, 3D point cloud of each frame is transformed to the world coordinate system and further voxel downsampled to a unified 3D point cloud. Figure 9a shows the reconstructed surface of sheet metal with our method, it shows that the overall shape of our reconstruction result matches the actual sheet metal shape well. The point clouds are rendered with Open3D library [24]. On the other side, sensor pose estimation error inevitably accumulated in the reconstruction process, which leads to obvious surface shape artifacts, as shown in Figure 9b,c. In which, Figure 9b shows the local surface inconsistency at 3 difference places using our method, Figure 9c shows the corresponding results using only geometric consistency constraints. With this comparison, it is not hard to see that introducing the curvature consistency constraint effectively improves the sensor pose estimation accuracy, which provides a good foundation for further global optimization.  To further quantitatively analyze the accuracy improvement with the global optimization, we computed the relative translation and rotation changes of each keyframe pose before and after global optimization, as shown in Figure 11, the optimized poses are taken as the reference values here. It demonstrates that even very small translation estimation inaccuracy (less than 2.0 mm) and rotation estimation inaccuracy (less than 0.10 • ) in the reconstruction range of 1300 mm × 400 mm, are enough to cause obvious surface inconsistency (as shown in Figure 9b), and lead to reconstruction results that are unusable for high-accuracy dimensional inspection. Meanwhile, the absolute accuracy of the reconstructed surface model can be directly and precisely tested by comparing (1) diameter fitting values of two spheres, (2) standard deviation values of Euclidean distances between sphere surface 3D points and the fitted sphere surface, (3) Euclidean distance between two sphere centers. The comparison is made between the not-optimized model, globally-optimized model and the ground truth. The ground-truth is obtained with the fitting values of frame 130, because two spheres are both measured in this frame, the fitting values are only related to the measurement accuracy of our FPP sensor, and are not affected by any sensor pose estimation error. Specifically, for each kind of data source, we manually cropped the corresponding points that belong to the two sphere surfaces, and fitted the diameter and standard deviation values using the Geomagic software. Table 1 shows the comparison results of diameter and standard deviation fitting values of two spheres. The standard deviation values directly reflect the surface consistency of our reconstruction model. After the global optimization, it decreases from 0.1971 mm to 0.0282 mm for sphere 1, and decreases from 0.2534 mm to 0.0301 mm for sphere 2. Furthermore, the standard deviation value of globally-optimized model is very close to the value of a single measurement (frame 130), which demonstrates that our reconstructed surface exhibits very good shape consistency. We also compared the difference of the sphere center distances between not-optimized and globally-optimized models, as shown in Table 2. The absolute error of sphere center distance relative to the ground truth decreases from 0.2080 mm to 0.0205 mm, the relative error relative to the ground truth decreases from 0.1387% to 0.0137%. Both of the above two comparison results explain the surface shape inconsistency refinement from Figure 9a,b to Figure 10a,b, and illustrate that with the global optimization (1) the accumulated error is substantially reduced to less than 1/10 of the not-optimized reconstruction result, (2) the final sensor pose estimation accuracy can well match the measurement accuracy of our FPP sensor.

Conclusions
In this paper, we present a high-accuracy globally consistent surface reconstruction method using fringe projection profilometry. The accumulated sensor pose estimation error problem is solved with a first relative sensor pose estimation step and a following global sensor pose optimization step. The former step tries to reduce the accumulated error by maximizing the relative sensor pose estimation accuracy; it helps to ensure the initial sensor poses lie in the convergence basin of the following global optimization method. The latter step globally optimizes the sensor poses through a multi-view point cloud registration formulated in the pose graph optimization framework. Besides, adaptive keyframe selection and loop closure detection method are proposed to efficiently and automatically build point cloud connections and their relative pose constraints, which are the prerequisites of global sensor pose optimization. By qualitatively observing and quantitatively analyzing the reconstruction results of a 1300 mm × 400 mm workpiece, we validated the effectiveness and accuracy of our method.
Our method demonstrates the ability to accomplish industrial-level surface model reconstruction without any external positional assistance but only using a single FPP sensor.
Since our reconstruction method is based on 3D registration, it also shares some limitations similar to most 3D registration based surface reconstruction methods [7,12,16]. For example, when the target object is near a plane, 3D registration may not converge to a correct result due to insufficient geometric constraint [11], which will stop the sensor poses from being robustly tracked. A possible solution is to further exploit the usage of surface textures constraint to help the robust tracking of sensor poses.

Conflicts of Interest:
The authors declare no conflict of interest.